Grounding World Simulation Models in a Real-World Metropolis Paper • 2603.15583 • Published 1 day ago • 111
MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data Paper • 2603.09206 • Published 8 days ago • 49
Can Large Language Models Keep Up? Benchmarking Online Adaptation to Continual Knowledge Streams Paper • 2603.07392 • Published 10 days ago • 17
XSkill: Continual Learning from Experience and Skills in Multimodal Agents Paper • 2603.12056 • Published 6 days ago • 28
EvoScientist: Towards Multi-Agent Evolving AI Scientists for End-to-End Scientific Discovery Paper • 2603.08127 • Published 9 days ago • 12
OfficeQA Pro: An Enterprise Benchmark for End-to-End Grounded Reasoning Paper • 2603.08655 • Published 9 days ago • 3
AutoResearch-RL: Perpetual Self-Evaluating Reinforcement Learning Agents for Autonomous Neural Architecture Discovery Paper • 2603.07300 • Published 11 days ago • 16
Beyond Language Modeling: An Exploration of Multimodal Pretraining Paper • 2603.03276 • Published 15 days ago • 93
BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing? Paper • 2603.03194 • Published 15 days ago • 55
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation Paper • 2602.24286 • Published 19 days ago • 90
Tool Verification for Test-Time Reinforcement Learning Paper • 2603.02203 • Published 16 days ago • 6
SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale Paper • 2602.23866 • Published 19 days ago • 86