Multi-Turn Evaluation Benchmarks A collection of benchmarks for evaluating LMs or VLMs under multi-turn interaction passing2961/MultiVerse Viewer • Updated Nov 1, 2025 • 647 • 124 • 1 passing2961/photochat_plus Viewer • Updated Dec 3, 2024 • 968 • 148 • 4 RefineBench/RefineBench Viewer • Updated Dec 2, 2025 • 1k • 1.32k • 5
Stark Social Long-Term Multi-Modal Conversation with Persona Commonsense Knowledge passing2961/stark-face-image Viewer • Updated Nov 6, 2024 • 93.6k • 94 • 3 passing2961/stark-summary Viewer • Updated Nov 6, 2024 • 53.3k • 119 • 2 passing2961/stark-image-url Viewer • Updated Nov 6, 2024 • 899k • 232 • 1 passing2961/stark-image Viewer • Updated Nov 6, 2024 • 1.72M • 225 • 3
DialogCC General multi-modal conversation datasets passing2961/dialogcc Viewer • Updated Jun 24, 2024 • 83.4k • 55 • 10
Thanos Skill-of-Mind-Infused LLM passing2961/Thanos-1B 1B • Updated Nov 8, 2024 • 17 passing2961/Thanos-3B 3B • Updated Nov 8, 2024 • 2 • 4 passing2961/Thanos-8B 8B • Updated Nov 8, 2024 • 1 • 3 passing2961/multifaceted-skill-of-mind Viewer • Updated Nov 8, 2024 • 100k • 116 • 5
Ultron Multi-modal conversation model & Multi-modal dialogue summarization model passing2961/Ultron-Summarizer-1B 1B • Updated Nov 6, 2024 • 2 passing2961/Ultron-Summarizer-3B 3B • Updated Nov 6, 2024 • 5 • 3 passing2961/Ultron-Summarizer-8B 8B • Updated Nov 6, 2024 • 2 • 2 passing2961/Ultron-11B 11B • Updated Nov 6, 2024 • 5 • 1
Multi-Turn Evaluation Benchmarks A collection of benchmarks for evaluating LMs or VLMs under multi-turn interaction passing2961/MultiVerse Viewer • Updated Nov 1, 2025 • 647 • 124 • 1 passing2961/photochat_plus Viewer • Updated Dec 3, 2024 • 968 • 148 • 4 RefineBench/RefineBench Viewer • Updated Dec 2, 2025 • 1k • 1.32k • 5
Thanos Skill-of-Mind-Infused LLM passing2961/Thanos-1B 1B • Updated Nov 8, 2024 • 17 passing2961/Thanos-3B 3B • Updated Nov 8, 2024 • 2 • 4 passing2961/Thanos-8B 8B • Updated Nov 8, 2024 • 1 • 3 passing2961/multifaceted-skill-of-mind Viewer • Updated Nov 8, 2024 • 100k • 116 • 5
Stark Social Long-Term Multi-Modal Conversation with Persona Commonsense Knowledge passing2961/stark-face-image Viewer • Updated Nov 6, 2024 • 93.6k • 94 • 3 passing2961/stark-summary Viewer • Updated Nov 6, 2024 • 53.3k • 119 • 2 passing2961/stark-image-url Viewer • Updated Nov 6, 2024 • 899k • 232 • 1 passing2961/stark-image Viewer • Updated Nov 6, 2024 • 1.72M • 225 • 3
Ultron Multi-modal conversation model & Multi-modal dialogue summarization model passing2961/Ultron-Summarizer-1B 1B • Updated Nov 6, 2024 • 2 passing2961/Ultron-Summarizer-3B 3B • Updated Nov 6, 2024 • 5 • 3 passing2961/Ultron-Summarizer-8B 8B • Updated Nov 6, 2024 • 2 • 2 passing2961/Ultron-11B 11B • Updated Nov 6, 2024 • 5 • 1
DialogCC General multi-modal conversation datasets passing2961/dialogcc Viewer • Updated Jun 24, 2024 • 83.4k • 55 • 10