MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data Paper • 2603.25319 • Published 4 days ago • 27
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale Paper • 2603.25040 • Published 4 days ago • 112
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents Paper • 2603.24440 • Published 5 days ago • 87
UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience Paper • 2603.24533 • Published 5 days ago • 40
Feb 15–Mar 25 '26 Collection Multimodal Collections Sublist • 7 items • Updated about 23 hours ago • 1
VGA [* Polaris Series] Collection capable of accurately locating and understanding 'any' object | State: Experimental, Category: Object Detection, High-depth analysis in visual tasks • 7 items • Updated 4 days ago • 1
view article Article SynthVision: Building a 110K Synthetic Medical VQA Dataset with Cross-Model Validation 7 days ago • 13
Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation Paper • 2603.19220 • Published 11 days ago • 63
SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing Paper • 2603.19228 • Published 11 days ago • 66
view article Article ATE-2: State-of-the-Art Armenian Text Embeddings and the ArmBench-TextEmbed Benchmark 11 days ago • 8
Qianfan-OCR: A Unified End-to-End Model for Document Intelligence Paper • 2603.13398 • Published 19 days ago • 151
view article Article The First Healthcare Robotics Dataset and Foundational Physical AI Models for Healthcare Robotics 14 days ago • 22
Grounding World Simulation Models in a Real-World Metropolis Paper • 2603.15583 • Published 14 days ago • 149
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data Paper • 2603.15594 • Published 14 days ago • 148