Locally Confident, Globally Stuck: The Quality-Exploration Dilemma in Diffusion Language Models
Abstract
Diffusion large language models face a tradeoff between generation quality and reasoning path exploration, which is addressed through a novel sampling method that balances these competing objectives.
Diffusion large language models (dLLMs) theoretically permit token decoding in arbitrary order, a flexibility that could enable richer exploration of reasoning paths than autoregressive (AR) LLMs. In practice, however, random-order decoding often hurts generation quality. To mitigate this, low-confidence remasking improves single-sample quality (e.g., Pass@1) by prioritizing confident tokens, but it also suppresses exploration and limits multi-sample gains (e.g., Pass@k), creating a fundamental quality--exploration dilemma. In this paper, we provide a unified explanation of this dilemma. We show that low-confidence remasking improves a myopic proxy for quality while provably constraining the entropy of the induced sequence distribution. To overcome this limitation, we characterize the optimal distribution that explicitly balances quality and exploration, and develop a simple Independent Metropolis--Hastings sampler that approximately targets this distribution during decoding. Experiments across a range of reasoning benchmarks including MATH500, AIME24/25, HumanEval, and MBPP show that our approach yields better exploration-quality tradeoff than both random and low-confidence remasking.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Improving Sampling for Masked Diffusion Models via Information Gain (2026)
- Locally Coherent Parallel Decoding in Diffusion Language Models (2026)
- Search or Accelerate: Confidence-Switched Position Beam Search for Diffusion Language Models (2026)
- Breaking the Factorization Barrier in Diffusion Language Models (2026)
- Test-Time Scaling with Diffusion Language Models via Reward-Guided Stitching (2026)
- Dependency-Guided Parallel Decoding in Discrete Diffusion Language Models (2026)
- T3D: Few-Step Diffusion Language Models via Trajectory Self-Distillation with Direct Discriminative Optimization (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2604.00375 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper
