ESPIRE: A Diagnostic Benchmark for Embodied Spatial Reasoning of Vision-Language Models Paper • 2603.13033 • Published 9 days ago • 13
Cosmos-Tokenizer Collection A suite of image and video tokenizers • 12 items • Updated 2 days ago • 43