SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer Paper • 2605.15178 • Published 25 days ago • 86
AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation Paper • 2605.13724 • Published 26 days ago • 101
LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation Paper • 2605.18739 • Published 21 days ago • 112
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture Paper • 2605.12500 • Published 27 days ago • 191
Mode Seeking meets Mean Seeking for Fast Long Video Generation Paper • 2602.24289 • Published Feb 27 • 41
Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance Paper • 2603.02175 • Published Mar 2 • 24
Code2World: A GUI World Model via Renderable Code Generation Paper • 2602.09856 • Published Feb 10 • 201
Urban Socio-Semantic Segmentation with Vision-Language Reasoning Paper • 2601.10477 • Published Jan 15 • 155
Region-Constraint In-Context Generation for Instructional Video Editing Paper • 2512.17650 • Published Dec 19, 2025 • 53
Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation Paper • 2510.01284 • Published Sep 30, 2025 • 37
view article Article You could have designed state of the art positional encoding FL33TW00D-HF • Nov 25, 2024 • 482
JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization Paper • 2503.23377 • Published Mar 30, 2025 • 57
MotionPro: A Precise Motion Controller for Image-to-Video Generation Paper • 2505.20287 • Published May 26, 2025 • 20
Wonderland: Navigating 3D Scenes from a Single Image Paper • 2412.12091 • Published Dec 16, 2024 • 16
Stable Flow: Vital Layers for Training-Free Image Editing Paper • 2411.14430 • Published Nov 21, 2024 • 22