Self-Improving Language Models with Bidirectional Evolutionary Search
Abstract
Bidirectional Evolutionary Search combines forward candidate evolution with backward goal decomposition to improve language model generation by overcoming limitations of traditional search methods.
Search has been proposed as an effective method for self-improving language models and agentic systems, both for post-training sample generation and for inference. However, widely used methods such as best-of-N sampling and tree search face two fundamental limitations: they are guided by sparse verification signals, and they construct candidates primarily through autoregressive expansion, restricting exploration to regions with substantial model probability mass. To address these, we propose Bidirectional Evolutionary Search (BES), a search framework that couples forward candidate evolution with backward goal decomposition. In the forward search, BES augments standard expansion with evolution operators that recombine partial trajectories to generate candidates that are difficult to obtain from a single model rollout. In the backward search, BES recursively decomposes the original task into checkable subgoals, producing dense intermediate feedback that guides forward search. We provide theoretical motivation showing that candidates generated by expansion-only search are confined to a narrow entropy shell while evolutionary operators can escape it, and that backward search can exponentially reduce the number of required samples to find a correct answer. Experiments show that on challenging post-training tasks where mainstream post-training algorithms fail to improve, BES enables consistent gains, and on three open problem solving benchmarks at inference time, BES outperforms existing open-source frameworks in both average and best-case performance. Code and trained models are available at https://github.com/Embodied-Minds-Lab/BES.
Community
We propose Bidirectional Evolutionary Search (BES), a search framework that couples forward candidate evolution with backward goal decomposition.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- MARS$^2$: Scaling Multi-Agent Tree Search via Reinforcement Learning for Code Generation (2026)
- OASES: Outcome-Aligned Search-Evaluation Co-Training for Agentic Search (2026)
- IG-Search: Step-Level Information Gain Rewards for Search-Augmented Reasoning (2026)
- Knowledge-Graph Paths as Intermediate Supervision for Self-Evolving Search Agents (2026)
- Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning (2026)
- Peer-Predictive Self-Training for Language Model Reasoning (2026)
- CoTEvol: Self-Evolving Chain-of-Thoughts for Data Synthesis in Mathematical Reasoning (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Made an audio walkthrough of this paper for anyone who wants to skim it on the go:
https://researchpod.app/episode/8833bb63-a455-4baf-a272-70a561a19a8c
Generated automatically by ResearchPod — happy to take feedback from the authors.
Get this paper in your agent:
hf papers read 2605.28814 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 4
Xkev/gemma-3-1b-it-kk-bes
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper