YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
ACO β Agent Cost Optimizer
Status: Core finding holds. Live agent validation blocked on Docker daemon access.
Static cascade routing saves 56% at statistically equivalent quality on SWE-bench.
What This Is
A drop-in Python wrapper that selects cheaper models for most coding tasks and escalates to frontier models only when needed. Three-line integration.
The Core Result
| Strategy | Solved | Cost | $/Solved |
|---|---|---|---|
| Cascade T1βT2βT4 | 416/500 (83.2%) | $86.33 | $0.2075 |
| Always frontier (Claude) | 391/500 (78.2%) | $158.34 | $0.4050 |
| Frontier with retry | 420/500 (84.0%) | $196.77 | $0.4685 |
Cascade saves 56% at statistically equivalent quality. The 4-instance gap to frontier-retry is within noise (95% CI [-2.8pp, +1.0pp]). Cost difference is significant.
Analysis: 500 SWE-bench Verified instances, oracle data from 4 models.
How It Works
T1 (deepseek-v4-flash) β T2 (gpt-5-mini) β T4 (claude-opus-4.7)
316/500 resolve 43 more resolve 57 more resolve
No ML. No prediction. Model failure modes are orthogonal β cheap models catch instances frontier models miss.
What's Confirmed vs What Was Wrong
Confirmed:
- Cascade saves ~56% at statistically equivalent quality
- Provider routing adds $18 savings (Bedrock for T4)
- Model diversity is real β 10 instances solved by T1/T2 but neither T4 model
- ML routing doesn't beat static cascade (oracle gap too narrow)
Previously wrong (corrected in May 2026 review):
- $585 provider savings β $18.16 (unit error)
- Cascade solves more than frontier β solves 4 fewer than frontier-retry (within noise)
- BERT 5-class router useful β predicts majority class, effectively random
- Macro tools save $44 β overly optimistic
Blocking issue: Patch verification requires Docker containers. Trace simulation proves the cascade would save money; only live Docker runs prove it does.
Useful Files
| File | Purpose |
|---|---|
aco/aco_live.py |
Drop-in wrapper (3 strategies) |
aco/per_step_router.py |
Per-step command routing |
cascade_agent.py |
Docker agent (ready, needs Docker daemon) |
CORRECTED_REPORT.md |
Full corrected analysis |
docs/literature_review.md |
Cost optimization literature |
docs/trained_router_final_report.md |
Why ML routing failed |
Repo Map
aco/ β Production code (the product)
training/ β Evaluation & training scripts
docs/ β Reports and documentation
eval/ β Evaluation results
router_models/ β Archived routers (failures, kept for reference)
cascade_agent.py β Docker cascade agent
CORRECTED_REPORT.md β Corrected analysis after all fixes
FIXES_COMPLETE.md β Summary of 5 critical-review fixes
Next Steps
- Run cascade agent on 50 instances with Docker β verify patches
- Add gpt-5.2-medium as additional cascade tier
- Adaptive repo routing for repos where frontier dominates
- Live safe-proposal model validation
- Downloads last month
- 101