SkillClaw: Let Skills Evolve Collectively with Agentic Evolver Paper • 2604.08377 • Published 3 days ago • 207
ClawBench: Can AI Agents Complete Everyday Online Tasks? Paper • 2604.08523 • Published 3 days ago • 98
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published 22 days ago • 330
LongCat-Next: Lexicalizing Modalities as Discrete Tokens Paper • 2603.27538 • Published 13 days ago • 137
SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks Paper • 2603.24755 • Published 17 days ago • 28
LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning Paper • 2603.21065 • Published 21 days ago • 77
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild Paper • 2603.17187 • Published 25 days ago • 136
TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas Paper • 2603.16448 • Published 25 days ago • 58
InCoder-32B: Code Foundation Model for Industrial Scenarios Paper • 2603.16790 • Published 25 days ago • 307
MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification Paper • 2603.15726 • Published 26 days ago • 184
SWE-Skills-Bench: Do Agent Skills Actually Help in Real-World Software Engineering? Paper • 2603.15401 • Published 26 days ago • 18
In-Context Reinforcement Learning for Tool Use in Large Language Models Paper • 2603.08068 • Published Mar 9 • 43
Can Large Language Models Keep Up? Benchmarking Online Adaptation to Continual Knowledge Streams Paper • 2603.07392 • Published Mar 8 • 18
BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning Paper • 2603.04918 • Published Mar 5 • 56
BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing? Paper • 2603.03194 • Published Mar 3 • 57
SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale Paper • 2602.23866 • Published Feb 27 • 88