Papers
arxiv:2605.18401

SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution

Published on May 18
· Submitted by
Hongyi Liu
on May 19
#1 Paper of the day
Authors:
,
,
,

Abstract

SkillsVote is a governance framework for long-horizon LLM agents that manages reusable skills through structured collection, recommendation, and evolution processes.

AI-generated summary

Long-horizon LLM agents leave traces that could become reusable experience, but raw trajectories are noisy and hard to govern. We treat Agent Skills as an experience schema that couples executable scripts, with non-executable guidance on procedures. Yet open skill ecosystems contain redundant, uneven, environment-sensitive artifacts, and indiscriminate updates can pollute future context. We present SkillsVote, a lifecycle-governance framework for Agent Skills from collection and recommendation to evolution. SkillsVote profiles a million-scale open-source corpus for environment requirements, quality, and verifiability, then synthesizes tasks for verifiable skills. Before execution, SkillsVote performs agentic library search over structured skill library to expose instructional skill context. After execution, it decomposes trajectories into skill-linked subtasks, attributes outcomes to skill use, agent exploration, environment, and result signals, and admits only successful reusable discoveries to evidence-gated updates. In our evaluation, offline evolution improves GPT-5.2 on Terminal-Bench 2.0 by up to 7.9 pp, while online evolution improves SWE-Bench Pro by up to 2.6 pp. Overall, governed external skill libraries can improve frozen agents without model updates when systems control exposure, credit, and preservation.

Community

Paper author Paper submitter

We introduce SkillsVote, a lifecycle governance framework for Agent Skills. Instead of treating long-horizon agent trajectories as disposable traces, SkillsVote converts them into reusable, executable skills with procedural guidance, while controlling quality, redundancy, environment sensitivity, and unsafe updates.

SkillsVote covers the full skill lifecycle: profiling a million-scale open-source skill corpus, recommending relevant skills before execution, attributing post-execution outcomes to skill use, exploration, environment, and result signals, and admitting only successful reusable discoveries through evidence-gated evolution. Experiments show that governed external skill libraries can improve frozen LLM agents without model updates, achieving up to +7.9 pp on Terminal-Bench 2.0 and +2.6 pp on SWE-Bench Pro.
benchmark_pass_rates_w_arrow

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.18401 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.18401 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.18401 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.