Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale Paper β’ 2603.25040 β’ Published 2 days ago β’ 91
PrismAudio: Decomposed Chain-of-Thoughts and Multi-dimensional Rewards for Video-to-Audio Generation Paper β’ 2511.18833 β’ Published Nov 24, 2025 β’ 5
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model Paper β’ 2603.21986 β’ Published 5 days ago β’ 113
dots.mocr Collection Multimodal OCR: Parse Anything from Documents β’ 2 items β’ Updated 8 days ago β’ 7
InCoder-32B: Code Foundation Model for Industrial Scenarios Paper β’ 2603.16790 β’ Published 10 days ago β’ 302
MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification Paper β’ 2603.15726 β’ Published 11 days ago β’ 181
Qianfan-OCR: A Unified End-to-End Model for Document Intelligence Paper β’ 2603.13398 β’ Published 17 days ago β’ 149
daVinci-Env: Open SWE Environment Synthesis at Scale Paper β’ 2603.13023 β’ Published 15 days ago β’ 30