V_{0.5}: Generalist Value Model as a Prior for Sparse RL Rollouts Paper • 2603.10848 • Published 23 days ago • 14
Flash-KMeans: Fast and Memory-Efficient Exact K-Means Paper • 2603.09229 • Published 24 days ago • 82
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published Aug 7, 2025 • 190
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates Paper • 2502.06772 • Published Feb 10, 2025 • 22