LatentUM: Unleashing the Potential of Interleaved Cross-Modal Reasoning via a Latent-Space Unified Model Paper • 2604.02097 • Published 9 days ago • 30
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model Paper • 2603.21986 • Published 18 days ago • 121
Data Darwinism Part I: Unlocking the Value of Scientific Data for Pre-training Paper • 2602.07824 • Published Feb 8 • 18
What Does Vision Tool-Use Reinforcement Learning Really Learn? Disentangling Tool-Induced and Intrinsic Effects for Crop-and-Zoom Paper • 2602.01334 • Published Feb 1 • 3
daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently Paper • 2602.02619 • Published Feb 2 • 53
daVinci-Dev: Agent-native Mid-training for Software Engineering Paper • 2601.18418 • Published Jan 26 • 126
AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts Paper • 2601.11044 • Published Jan 16 • 34
One Sample to Rule Them All: Extreme Data Efficiency in RL Scaling Paper • 2601.03111 • Published Jan 6 • 10
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation Paper • 2512.23576 • Published Dec 29, 2025 • 66