Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency Paper • 2501.04931 • Published Jan 9, 2025
From reactive to cognitive: brain-inspired spatial intelligence for embodied agents Paper • 2508.17198 • Published Aug 24, 2025 • 10
SFHand: A Streaming Framework for Language-guided 3D Hand Forecasting and Embodied Manipulation Paper • 2511.18127 • Published Nov 22, 2025 • 1
Can MLLMs Read the Room? A Multimodal Benchmark for Verifying Truthfulness in Multi-Party Social Interactions Paper • 2510.27195 • Published Oct 31, 2025 • 1
Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality? Paper • 2605.22109 • Published 3 days ago • 157
Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality? Paper • 2605.22109 • Published 3 days ago • 157
Can MLLMs Read the Room? A Multimodal Benchmark for Verifying Truthfulness in Multi-Party Social Interactions Paper • 2510.27195 • Published Oct 31, 2025 • 1
SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise Paper • 2602.12783 • Published Feb 13 • 246
SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise Paper • 2602.12783 • Published Feb 13 • 246
firdhokk/speech-emotion-recognition-with-openai-whisper-large-v3 Audio Classification • 0.6B • Updated Nov 1, 2025 • 35.6k • 110
Runtime error Agents Featured 1.03k DragGan - Drag Your GAN 👆 1.03k Manipulate images by dragging points