SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model Paper • 2602.21818 • Published 7 days ago • 52
ACoT-VLA: Action Chain-of-Thought for Vision-Language-Action Models Paper • 2601.11404 • Published Jan 16 • 26
Act2Goal: From World Model To General Goal-conditioned Policy Paper • 2512.23541 • Published Dec 29, 2025 • 23
Act2Goal: From World Model To General Goal-conditioned Policy Paper • 2512.23541 • Published Dec 29, 2025 • 23
Bidirectional Normalizing Flow: From Data to Noise and Back Paper • 2512.10953 • Published Dec 11, 2025 • 7
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation Paper • 2508.05635 • Published Aug 7, 2025 • 73
LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation Paper • 2510.22946 • Published Oct 27, 2025 • 18 • 2
Fidelity-Aware Data Composition for Robust Robot Generalization Paper • 2509.24797 • Published Sep 29, 2025 • 2
Fidelity-Aware Data Composition for Robust Robot Generalization Paper • 2509.24797 • Published Sep 29, 2025 • 2 • 2
GRPO-MA: Multi-Answer Generation in GRPO for Stable and Efficient Chain-of-Thought Training Paper • 2509.24494 • Published Sep 29, 2025 • 11
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation Paper • 2508.05635 • Published Aug 7, 2025 • 73