LLMs-to-test Qwen/Qwen3-0.6B Text Generation • 0.8B • Updated Jul 26, 2025 • 14.2M • 1.17k Qwen/Qwen3-1.7B Text Generation • 2B • Updated Jul 26, 2025 • 7.14M • • 439 Qwen/Qwen3-4B Text Generation • Updated Jul 26, 2025 • 8.69M • 586 Qwen/Qwen3-8B Text Generation • 8B • Updated Jul 26, 2025 • 9.36M • • 1.03k
Datasets-ScaleLLM truthfulqa/truthful_qa Viewer • Updated Jan 4, 2024 • 1.63k • 84.1k • 278 allenai/qasc Viewer • Updated Jan 4, 2024 • 9.98k • 4.98k • 23 Anthropic/model-written-evals Viewer • Updated Dec 21, 2022 • 3.25k • 1.13k • 60 yesilhealth/Health_Benchmarks Viewer • Updated Apr 20, 2025 • 7.54k • 1.09k • 9
LLMs-to-test Qwen/Qwen3-0.6B Text Generation • 0.8B • Updated Jul 26, 2025 • 14.2M • 1.17k Qwen/Qwen3-1.7B Text Generation • 2B • Updated Jul 26, 2025 • 7.14M • • 439 Qwen/Qwen3-4B Text Generation • Updated Jul 26, 2025 • 8.69M • 586 Qwen/Qwen3-8B Text Generation • 8B • Updated Jul 26, 2025 • 9.36M • • 1.03k
Datasets-ScaleLLM truthfulqa/truthful_qa Viewer • Updated Jan 4, 2024 • 1.63k • 84.1k • 278 allenai/qasc Viewer • Updated Jan 4, 2024 • 9.98k • 4.98k • 23 Anthropic/model-written-evals Viewer • Updated Dec 21, 2022 • 3.25k • 1.13k • 60 yesilhealth/Health_Benchmarks Viewer • Updated Apr 20, 2025 • 7.54k • 1.09k • 9