VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions Paper • 2605.27141 • Published 26 days ago • 19
HodgeCover: Higher-Order Topological Coverage Drives Compression of Sparse Mixture-of-Experts Paper • 2605.13997 • Published May 13 • 5
Look Before You Leap: Autonomous Exploration for LLM Agents Paper • 2605.16143 • Published May 15 • 9
Learning from Failures: Correction-Oriented Policy Optimization with Verifiable Rewards Paper • 2605.14539 • Published May 14 • 7
Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation Paper • 2605.11739 • Published May 13 • 59
Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding Paper • 2605.02290 • Published May 4 • 42
Nudging Beyond the Comfort Zone: Efficient Strategy-Guided Exploration for RLVR Paper • 2605.15726 • Published May 15 • 34
SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization Paper • 2604.02268 • Published Apr 2 • 101