YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

ACO β€” Agent Cost Optimizer

Status: Core finding holds. Live agent validation blocked on Docker daemon access.

Static cascade routing saves 56% at statistically equivalent quality on SWE-bench.


What This Is

A drop-in Python wrapper that selects cheaper models for most coding tasks and escalates to frontier models only when needed. Three-line integration.

The Core Result

Strategy Solved Cost $/Solved
Cascade T1β†’T2β†’T4 416/500 (83.2%) $86.33 $0.2075
Always frontier (Claude) 391/500 (78.2%) $158.34 $0.4050
Frontier with retry 420/500 (84.0%) $196.77 $0.4685

Cascade saves 56% at statistically equivalent quality. The 4-instance gap to frontier-retry is within noise (95% CI [-2.8pp, +1.0pp]). Cost difference is significant.

Analysis: 500 SWE-bench Verified instances, oracle data from 4 models.


How It Works

T1 (deepseek-v4-flash) β†’ T2 (gpt-5-mini) β†’ T4 (claude-opus-4.7)
   316/500 resolve         43 more resolve        57 more resolve

No ML. No prediction. Model failure modes are orthogonal β€” cheap models catch instances frontier models miss.


What's Confirmed vs What Was Wrong

Confirmed:

  • Cascade saves ~56% at statistically equivalent quality
  • Provider routing adds $18 savings (Bedrock for T4)
  • Model diversity is real β€” 10 instances solved by T1/T2 but neither T4 model
  • ML routing doesn't beat static cascade (oracle gap too narrow)

Previously wrong (corrected in May 2026 review):

  • $585 provider savings β†’ $18.16 (unit error)
  • Cascade solves more than frontier β†’ solves 4 fewer than frontier-retry (within noise)
  • BERT 5-class router useful β†’ predicts majority class, effectively random
  • Macro tools save $44 β†’ overly optimistic

Blocking issue: Patch verification requires Docker containers. Trace simulation proves the cascade would save money; only live Docker runs prove it does.


Useful Files

File Purpose
aco/aco_live.py Drop-in wrapper (3 strategies)
aco/per_step_router.py Per-step command routing
cascade_agent.py Docker agent (ready, needs Docker daemon)
CORRECTED_REPORT.md Full corrected analysis
docs/literature_review.md Cost optimization literature
docs/trained_router_final_report.md Why ML routing failed

Repo Map

aco/                  β€” Production code (the product)
training/             β€” Evaluation & training scripts
docs/                 β€” Reports and documentation
eval/                 β€” Evaluation results
router_models/        β€” Archived routers (failures, kept for reference)
cascade_agent.py      β€” Docker cascade agent
CORRECTED_REPORT.md   β€” Corrected analysis after all fixes
FIXES_COMPLETE.md     β€” Summary of 5 critical-review fixes

Next Steps

  1. Run cascade agent on 50 instances with Docker β†’ verify patches
  2. Add gpt-5.2-medium as additional cascade tier
  3. Adaptive repo routing for repos where frontier dominates
  4. Live safe-proposal model validation
Downloads last month
101
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support