InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing Paper • 2603.09877 • Published 19 days ago • 47
Yume-1.5: A Text-Controlled Interactive World Generation Model Paper • 2512.22096 • Published Dec 26, 2025 • 61
UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture Paper • 2512.21675 • Published Dec 25, 2025 • 25
Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout Paper • 2511.20649 • Published Nov 25, 2025 • 50
Emu3.5: Native Multimodal Models are World Learners Paper • 2510.26583 • Published Oct 30, 2025 • 114
PICABench: How Far Are We from Physically Realistic Image Editing? Paper • 2510.17681 • Published Oct 20, 2025 • 65