ACO — Agent Cost Optimizer

Status: Core finding holds. Live agent validation blocked on Docker daemon access.

Static cascade routing saves 56% at statistically equivalent quality on SWE-bench.

What This Is

A drop-in Python wrapper that selects cheaper models for most coding tasks and escalates to frontier models only when needed. Three-line integration.

The Core Result

Strategy	Solved	Cost	$/Solved
Cascade T1→T2→T4	416/500 (83.2%)	$86.33	$0.2075
Always frontier (Claude)	391/500 (78.2%)	$158.34	$0.4050
Frontier with retry	420/500 (84.0%)	$196.77	$0.4685

Cascade saves 56% at statistically equivalent quality. The 4-instance gap to frontier-retry is within noise (95% CI [-2.8pp, +1.0pp]). Cost difference is significant.

Analysis: 500 SWE-bench Verified instances, oracle data from 4 models.

How It Works

T1 (deepseek-v4-flash) → T2 (gpt-5-mini) → T4 (claude-opus-4.7)
   316/500 resolve         43 more resolve        57 more resolve

No ML. No prediction. Model failure modes are orthogonal — cheap models catch instances frontier models miss.

What's Confirmed vs What Was Wrong

Confirmed:

Cascade saves ~56% at statistically equivalent quality
Provider routing adds $18 savings (Bedrock for T4)
Model diversity is real — 10 instances solved by T1/T2 but neither T4 model
ML routing doesn't beat static cascade (oracle gap too narrow)

Previously wrong (corrected in May 2026 review):

$585 provider savings → $18.16 (unit error)
Cascade solves more than frontier → solves 4 fewer than frontier-retry (within noise)
BERT 5-class router useful → predicts majority class, effectively random
Macro tools save $44 → overly optimistic

Blocking issue: Patch verification requires Docker containers. Trace simulation proves the cascade would save money; only live Docker runs prove it does.

Useful Files

File	Purpose
`aco/aco_live.py`	Drop-in wrapper (3 strategies)
`aco/per_step_router.py`	Per-step command routing
`cascade_agent.py`	Docker agent (ready, needs Docker daemon)
`CORRECTED_REPORT.md`	Full corrected analysis
`docs/literature_review.md`	Cost optimization literature
`docs/trained_router_final_report.md`	Why ML routing failed

Repo Map

aco/                  — Production code (the product)
training/             — Evaluation & training scripts
docs/                 — Reports and documentation
eval/                 — Evaluation results
router_models/        — Archived routers (failures, kept for reference)
cascade_agent.py      — Docker cascade agent
CORRECTED_REPORT.md   — Corrected analysis after all fixes
FIXES_COMPLETE.md     — Summary of 5 critical-review fixes

Next Steps

Run cascade agent on 50 instances with Docker → verify patches
Add gpt-5.2-medium as additional cascade tier
Adaptive repo routing for repos where frontier dominates
Live safe-proposal model validation

Downloads last month: 101

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support