MAOAM: Unified Object and Material Selection with Vision-Language Models Paper • 2606.04880 • Published 19 days ago • 10
SMART Collection Your Single-Vector Embedding Model is SMARTer Than You Think • 5 items • Updated 26 days ago • 2
From Plans to Pixels: Learning to Plan and Orchestrate for Open-Ended Image Editing Paper • 2605.15181 • Published May 14 • 12
Exploration and Exploitation Errors Are Measurable for Language Model Agents Paper • 2604.13151 • Published Apr 14 • 25
Unified Spatio-Temporal Token Scoring for Efficient Video VLMs Paper • 2603.18004 • Published Mar 18 • 14
Contamination Detection for VLMs using Multi-Modal Semantic Perturbation Paper • 2511.03774 • Published Nov 5, 2025 • 13
Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos Paper • 2410.02763 • Published Oct 3, 2024 • 7