Lance: Unified Multimodal Modeling by Multi-Task Synergy Paper • 2605.18678 • Published 9 days ago • 74
EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding Paper • 2605.09874 • Published 16 days ago • 2
jina-embeddings-v5-omni: Text-Geometry-Preserving Multimodal Embeddings via Frozen-Tower Composition Paper • 2605.08384 • Published 19 days ago • 10
jina-embeddings-v5-omni Collection Multimodal (text + image + video + audio) embedding models aligned with jina-embeddings-v5-text-*. Two sizes, four task variants each. • 27 items • Updated 14 days ago • 36
CollabVR: Collaborative Video Reasoning with Vision-Language and Video Generation Models Paper • 2605.08735 • Published 18 days ago • 69
SkillOS: Learning Skill Curation for Self-Evolving Agents Paper • 2605.06614 • Published 20 days ago • 45
view article Article Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents nvidia • 28 days ago • 59
VimRAG: Navigating Massive Visual Context in Retrieval-Augmented Generation via Multimodal Memory Graph Paper • 2602.12735 • Published Feb 13 • 8
WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM Paper • 2509.21990 • Published Sep 26, 2025 • 1
NLE: Non-autoregressive LLM-based ASR by Transcript Editing Paper • 2603.08397 • Published Mar 9 • 23
MAISI-v2: Accelerated 3D High-Resolution Medical Image Synthesis with Rectified Flow and Region-specific Contrastive Loss Paper • 2508.05772 • Published Aug 7, 2025 • 3
Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought Paper • 2505.19877 • Published May 26, 2025 • 4