VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions Paper • 2605.27141 • Published 6 days ago • 16
EMO: Earth Mover Distance Optimization for Auto-Regressive Language Modeling Paper • 2310.04691 • Published Oct 7, 2023 • 3
WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation Paper • 2605.25874 • Published 7 days ago • 100
LARY: A Latent Action Representation Yielding Benchmark for Generalizable Vision-to-Action Alignment Paper • 2604.11689 • Published Apr 13 • 21