DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch
Abstract
A large-scale dataset called DeNovoSWE is introduced for training code agents to generate entire software repositories from documentation, significantly improving performance on long-horizon software engineering tasks.
As the capabilities of LLM-based code agents continue to advance, their expected role is expanding beyond localized bug fixing in existing codebases toward architecting and implementing complete software repositories from high-level specifications. However, training agents for such long-horizon software engineering tasks remains difficult due to the scarcity of large-scale, verifiable whole-repository generation data. In this paper, we introduce DeNovoSWE, a large-scale dataset for whole-repository generation. DeNovoSWE comprises 4,818 high-quality instances, where each instance requires generating a complete repository from documentation. Our dataset is automatically constructed through a carefully designed sandboxed agentic workflow, enabling scalable curation without human annotation. DeNovoSWE is constructed with "divide and conquer" and critic-repair philosophy. To balance data quality and diversity, we further introduce a difficulty-aware trajectory filtering strategy. Fine-tuning Qwen3-30B-A3B on DeNovoSWE substantially improves long-horizon SWE performance, raising its score on the challenging BeyondSWE-Doc2Repo benchmark from 5.8% to 47.2%.
Community
As the capabilities of LLM-based code agents continue to advance, their expected role is expanding beyond localized bug fixing in existing codebases toward architecting and implementing complete software repositories from high-level specifications. However, training agents for such long-horizon software engineering tasks remains difficult due to the scarcity of large-scale, verifiable whole-repository generation data. In this paper, they introduce DeNovoSWE, a large-scale dataset for whole-repository generation.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- RepoZero: Can LLMs Generate a Code Repository from Scratch? (2026)
- D3-Gym: Constructing Real-World Verifiable Environments for Data-Driven Discovery (2026)
- SWE-Cycle: Benchmarking Code Agents across the Complete Issue Resolution Cycle (2026)
- RoadmapBench: Evaluating Long-Horizon Agentic Software Development Across Version Upgrades (2026)
- SWE-Chain: Benchmarking Coding Agents on Chained Release-Level Package Upgrades (2026)
- RAT: RunAnyThing via Fully Automated Environment Configuration (2026)
- SWE-Explore: Benchmarking How Coding Agents Explore Repositories (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2606.10728 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 3
AweAI-Team/DeNovoSWE-Trajectory-Filtered
Spaces citing this paper 0
No Space linking this paper