Access Sets Matter: Budgeting Expert Reads for Scalable Weight-Space Model Merging Paper • 2605.29489 • Published 11 days ago • 4
Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs Paper • 2605.30611 • Published 11 days ago • 192
QUACK: Questioning, Understanding, and Auditing Communicated Knowledge in Multimodal Social Deduction Agents Paper • 2605.27068 • Published 13 days ago • 24
Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality? Paper • 2605.22109 • Published 18 days ago • 169
AstraFlow: Dataflow-Oriented Reinforcement Learning for Agentic LLMs Paper • 2605.15565 • Published 24 days ago • 16
From Context to Skills: Can Language Models Learn from Context Skillfully? Paper • 2604.27660 • Published May 3 • 166
CUE-R: Beyond the Final Answer in Retrieval-Augmented Generation Paper • 2604.05467 • Published Apr 7 • 7
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 327
ACES: Who Tests the Tests? Leave-One-Out AUC Consistency for Code Generation Paper • 2604.03922 • Published Apr 5 • 53
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 506
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published Apr 3 • 632
A Systematic Study of Cross-Modal Typographic Attacks on Audio-Visual Reasoning Paper • 2604.03995 • Published Apr 5 • 4
AIBench: Evaluating Visual-Logical Consistency in Academic Illustration Generation Paper • 2603.28068 • Published Mar 31 • 13
Gen-Searcher: Reinforcing Agentic Search for Image Generation Paper • 2603.28767 • Published Mar 30 • 58
Can MLLMs Read Students' Minds? Unpacking Multimodal Error Analysis in Handwritten Math Paper • 2603.24961 • Published Mar 26 • 4