EventVLA: Event-Driven Visual Evidence Memory for Long-Horizon Vision-Language-Action Policies
Paper โข 2606.20092 โข Published โข 5
Computer Vision
Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction
RIVER: A Real-Time Interaction Benchmark for Video LLMs