arxiv:2606.04056

Token Budgets: An Empirical Catalog of 63 LLM-Agent Budget-Overrun Incidents, with an Affine-Typed Rust Mitigation as a Case Study

Published on Jun 2

· Submitted by

Sajjad Khan on Jun 4

Upvote

Authors:

Sajjad Khan

Abstract

LLM-agent budget overruns are a documented production failure class: a single retry loop can spend thousands of dollars before an operator notices, and the in-process integrity properties that would prevent it (no aliasing, no double-spend, no use-after-delegation of a cost-bearing value) are enforced, if at all, by ad-hoc wrappers rather than by the type system. Our central contribution is empirical: a catalog of 63 confirmed production incidents from 21 orchestration frameworks (2023-2026), each backed by a quoted GitHub issue and, where reported, a dollar loss, organized into an eight-cluster failure taxonomy (inter-rater Cohen's kappa = 0.837, N = 113), plus 47 supplementary structural entries. As one mitigation evaluated against this taxonomy, we build token-budgets, an 1,180-line Rust crate (no unsafe) that operationalizes affine ownership so that cloning, double-spending, or using a budget after delegating it are compile errors rather than runtime hazards an operator must remember to avoid. The dollar cap is runtime arithmetic under an estimator assumption; the affine layer makes that arithmetic non-bypassable. On single-agent workloads a 4-line Python counter matches the crate at 0/30 overshoot, so the distinguishing value is non-bypassability under operator error in multi-agent delegation: the delegation-fanout race documented in 11 incidents is rejected by the borrow checker at compile time, while the same pattern under asyncio overshoots 30/30 and three disciplined alternatives overshoot 0/30. Across five runtimes, three providers, and a temperature-stratified live-API test (N = 160), the approach reports zero cap violations and zero false refusals, at operational parity with concurrent work. Static over-reservation is 4-6x (2.11x adaptive). Binary-level cap-soundness on the running binary is left open.

View arXiv page View PDF GitHub 1 Add to collection

Community

sajjadanwar0

Paper author Paper submitter 2 days ago

•

edited 2 days ago

The core contribution is empirical, not the Rust crate: a catalog of 63 confirmed LLM-agent budget-overrun incidents across 21 orchestration frameworks (2023–2026), each backed by a quoted GitHub issue and (where reported) a dollar loss, classified at two-rater κ = 0.837 (N = 113). The crate is one case-study mitigation — affine ownership making clone/double-spend/use-after-delegation compile errors. Honest finding: on single-agent workloads a 4-line Python counter ties it 0/30; the affine type only pulls ahead under operator error in multi-agent delegation (the fan-out race is rejected by the borrow checker, while asyncio overshoots 30/30). Binary-level cap-soundness is left open. Full artifact (catalog CSV, crate, proofs, reproduce.sh): https://github.com/sajjadanwar0/token-budgets
Feedback on the taxonomy welcome.

librarian-bot

about 21 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.04056

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.04056 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.04056 in a Space README.md to link it from this page.