AgentHijack: Benchmarking Computer Use Agent Robustness to Common Environment Corruptions Paper • 2605.25707 • Published 13 days ago • 6
SOD: Step-wise On-policy Distillation for Small Language Model Agents Paper • 2605.07725 • Published 30 days ago • 25
Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players Paper • 2605.28816 • Published 11 days ago • 420
Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality? Paper • 2605.22109 • Published 17 days ago • 169
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence Paper • 2605.12882 • Published 25 days ago • 270
SEIF: Self-Evolving Reinforcement Learning for Instruction Following Paper • 2605.07465 • Published 30 days ago • 30
SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models Paper • 2603.16859 • Published Mar 17 • 249