VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models Paper • 2603.22003 • Published 4 days ago • 11
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence Paper • 2602.08683 • Published Feb 9 • 52
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance Paper • 2512.08765 • Published Dec 9, 2025 • 134
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper • 2510.23607 • Published Oct 27, 2025 • 181
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper • 2510.23607 • Published Oct 27, 2025 • 181
MGM-Omni Collection MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech • 13 items • Updated 25 days ago • 11