LongDS-Bench: On the Failure of Long-Horizon Agentic Data Analysis Paper • 2605.30434 • Published 7 days ago • 18
SEAL: Synergistic Co-Evolution of Agents and Learning Environments Paper • 2605.24426 • Published 12 days ago • 10
From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models Paper • 2605.20177 • Published 16 days ago • 10
DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Paper • 2605.21467 • Published 15 days ago • 204
AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration Paper • 2605.20025 • Published 16 days ago • 185
Multi-Objective and Mixed-Reward Reinforcement Learning via Reward-Decorrelated Policy Optimization Paper • 2605.13641 • Published 22 days ago • 50
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers Paper • 2605.06169 • Published 28 days ago • 233