GENIUS: Generative Fluid Intelligence Evaluation Suite Paper • 2602.11144 • Published about 20 hours ago • 37
Chain of Mindset: Reasoning with Adaptive Cognitive Modes Paper • 2602.10063 • Published 2 days ago • 69
How Well Do Models Follow Visual Instructions? VIBE: A Systematic Benchmark for Visual Instruction-Driven Image Editing Paper • 2602.01851 • Published 10 days ago • 16
CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation Paper • 2601.10061 • Published 28 days ago • 30
Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark Paper • 2510.26802 • Published Oct 30, 2025 • 34
Video-Thinker: Sparking "Thinking with Videos" via Reinforcement Learning Paper • 2510.23473 • Published Oct 27, 2025 • 85
Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks Paper • 2510.19195 • Published Oct 22, 2025 • 11
Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs Paper • 2510.24514 • Published Oct 28, 2025 • 22
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning Paper • 2510.13515 • Published Oct 15, 2025 • 12