LaSER: Internalizing Explicit Reasoning into Latent Space for Dense Retrieval
Abstract
LaSER introduces a self-distillation framework that embeds explicit reasoning into dense retrievers' latent space through dual-view training and multi-grained alignment, enabling efficient reasoning without autoregressive generation.
LLMs have fundamentally transformed dense retrieval, upgrading backbones from discriminative encoders to generative architectures. However, a critical disconnect remains: while LLMs possess strong reasoning capabilities, current retrievers predominantly utilize them as static encoders, leaving their potential for complex reasoning unexplored. To address this, existing approaches typically adopt rewrite-then-retrieve pipelines to generate explicit CoT rationales before retrieval. However, this incurs prohibitive latency. In this paper, we propose LaSER, a novel self-distillation framework that internalizes explicit reasoning into the latent space of dense retrievers. Operating on a shared LLM backbone, LaSER introduces a dual-view training mechanism: an Explicit view that explicitly encodes ground-truth reasoning paths, and a Latent view that performs implicit latent thinking. To bridge the gap between these views, we design a multi-grained alignment strategy. Beyond standard output alignment, we introduce a trajectory alignment mechanism that synchronizes the intermediate latent states of the latent path with the semantic progression of the explicit reasoning segments. This allows the retriever to think silently and effectively without autoregressive text generation. Extensive experiments on both in-domain and out-of-domain reasoning-intensive benchmarks demonstrate that LaSER significantly outperforms state-of-the-art baselines. Furthermore, analyses across diverse backbones and model scales validate the robustness of our approach, confirming that our unified learning framework is essential for eliciting effective latent thinking. Our method successfully combines the reasoning depth of explicit CoT pipelines with the inference efficiency of standard dense retrievers.
Community
Empowering dense retrievers with latent reasoning for high-performance, latency-free complex search. dense retrieval, latent reasoning, latent CoT, representation learning, efficiency.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Forest Before Trees: Latent Superposition for Efficient Visual Reasoning (2026)
- ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought (2026)
- CrystaL: Spontaneous Emergence of Visual Latents in MLLMs (2026)
- Unleash the Potential of Long Semantic IDs for Generative Recommendation (2026)
- Reasoning-Augmented Representations for Multimodal Retrieval (2026)
- Latent Thoughts Tuning: Bridging Context and Reasoning with Fused Information in Latent Tokens (2026)
- LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper