Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players Paper • 2605.28816 • Published 1 day ago • 43
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Paper • 2605.27365 • Published 2 days ago • 104
VGenST-Bench: A Benchmark for Spatio-Temporal Reasoning via Active Video Synthesis Paper • 2605.22570 • Published 7 days ago • 23
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence Paper • 2603.28032 • Published Mar 30 • 342
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models Paper • 2603.25716 • Published Mar 26 • 156
Repurposing Geometric Foundation Models for Multi-view Diffusion Paper • 2603.22275 • Published Mar 23 • 48
Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World Models Paper • 2603.22212 • Published Mar 23 • 126
Group3D: MLLM-Driven Semantic Grouping for Open-Vocabulary 3D Object Detection Paper • 2603.21944 • Published Mar 23 • 26
Versatile Editing of Video Content, Actions, and Dynamics without Training Paper • 2603.17989 • Published Mar 18 • 18
3DreamBooth: High-Fidelity 3D Subject-Driven Video Generation Model Paper • 2603.18524 • Published Mar 19 • 58
Gather-Scatter Mamba: Accelerating Propagation with Efficient State Space Model Paper • 2510.00862 • Published Oct 1, 2025
DiffuseSlide: Training-Free High Frame Rate Video Generation Diffusion Paper • 2506.01454 • Published Jun 2, 2025
OpenMonoGS-SLAM: Monocular Gaussian Splatting SLAM with Open-set Semantics Paper • 2512.08625 • Published Dec 9, 2025 • 1
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning Paper • 2603.17024 • Published Mar 17 • 110
Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models Paper • 2603.17051 • Published Mar 17 • 109