Enhancing Spatial Understanding in Image Generation via Reward Modeling Paper • 2602.24233 • Published 8 days ago • 48
UniG2U-Bench: Do Unified Models Advance Multimodal Understanding? Paper • 2603.03241 • Published 4 days ago • 79
Beyond Language Modeling: An Exploration of Multimodal Pretraining Paper • 2603.03276 • Published 4 days ago • 71
UniT: Unified Multimodal Chain-of-Thought Test-time Scaling Paper • 2602.12279 • Published 23 days ago • 20
CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation Paper • 2601.10061 • Published Jan 15 • 31
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense Paper • 2510.07242 • Published Oct 8, 2025 • 30
LUMINA: Detecting Hallucinations in RAG System with Context-Knowledge Signals Paper • 2509.21875 • Published Sep 26, 2025 • 10
Understanding Language Prior of LVLMs by Contrasting Chain-of-Embedding Paper • 2509.23050 • Published Sep 27, 2025 • 15