From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models Paper • 2602.22859 • Published 8 days ago • 148
view article Article ColPali: Efficient Document Retrieval with Vision Language Models 👀 Jul 5, 2024 • 315
From Words to Structured Visuals: A Benchmark and Framework for Text-to-Diagram Generation and Editing Paper • 2411.11916 • Published Nov 18, 2024 • 3