LISA++: An Improved Baseline for Reasoning Segmentation with Large Language Model Paper • 2312.17240 • Published Dec 28, 2023 • 1
RTime-QA: A Benchmark for Atomic Temporal Event Understanding in Large Multi-modal Models Paper • 2505.19125 • Published May 25, 2025
N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models Paper • 2512.16561 • Published Dec 18, 2025 • 20
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders Paper • 2603.06569 • Published 4 days ago • 88
RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing Paper • 2512.16864 • Published Dec 18, 2025 • 11
RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing Paper • 2512.16864 • Published Dec 18, 2025 • 11
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning Paper • 2505.12081 • Published May 17, 2025 • 18
Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma? Paper • 2503.12496 • Published Mar 16, 2025 • 1
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition Paper • 2412.09501 • Published Dec 12, 2024 • 48