Advancing Creative Physical Intelligence in Large Multimodal Models Paper • 2605.26396 • Published 11 days ago • 19
You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories Paper • 2605.21468 • Published 16 days ago • 50
Useful Memories Become Faulty When Continuously Updated by LLMs Paper • 2605.12978 • Published 23 days ago • 18
RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards Paper • 2605.10899 • Published 25 days ago • 78
Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph Paper • 2511.00086 • Published Oct 29, 2025 • 42
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning Paper • 2509.22576 • Published Sep 26, 2025 • 137
MIRIX: Multi-Agent Memory System for LLM-Based Agents Paper • 2507.07957 • Published Jul 10, 2025 • 80
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance Paper • 2506.06444 • Published Jun 6, 2025 • 73
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time Paper • 2505.24863 • Published May 30, 2025 • 97