ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models Paper • 2405.13729 • Published 14 days ago • 13
Map2World: Segment Map Conditioned Text to 3D World Generation Paper • 2605.00781 • Published 12 days ago • 25
VLS: Steering Pretrained Robot Policies via Vision-Language Models Paper • 2602.03973 • Published Feb 3 • 22
HeartMuLa: A Family of Open Sourced Music Foundation Models Paper • 2601.10547 • Published Jan 15 • 49
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer Paper • 2511.22699 • Published Nov 27, 2025 • 245
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion Paper • 2503.11576 • Published Mar 14, 2025 • 157
view article Article Introducing Waypoint-1: Real-time interactive video diffusion from Overworld +3 lapp0, LouisCastricato, ScottieFox, shahbuland, xAesthetics • Jan 20 • 43
view article Article We Got Claude to Build CUDA Kernels and teach open models! +2 burtenshaw, evalstate, merve, pcuenq • Jan 28 • 156
view article Article Preference Optimization for Vision Language Models +2 qgallouedec, vwxyzjn, merve, kashif • Jul 10, 2024 • 93
view article Article SmolLM3: smol, multilingual, long-context reasoner +21 eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf • Jul 8, 2025 • 773