ClawBench — Browser Agent Benchmark Suite Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. NAIL-Group/ClawBench Viewer • Updated 5 days ago • 153 • 361 • 2 Running Agents ClawBench Leaderboard 🦀 Live leaderboard for the ClawBench web-agent benchmark NAIL-Group/ClawBenchV1Trace Updated 5 days ago • 6.98k ClawBench: Can AI Agents Complete Everyday Online Tasks? Paper • 2604.08523 • Published Apr 9 • 263
ClawBench: Can AI Agents Complete Everyday Online Tasks? Paper • 2604.08523 • Published Apr 9 • 263
ClawBench — Browser Agent Benchmark Suite Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. NAIL-Group/ClawBench Viewer • Updated 5 days ago • 153 • 361 • 2 Running Agents ClawBench Leaderboard 🦀 Live leaderboard for the ClawBench web-agent benchmark NAIL-Group/ClawBenchV1Trace Updated 5 days ago • 6.98k ClawBench: Can AI Agents Complete Everyday Online Tasks? Paper • 2604.08523 • Published Apr 9 • 263
ClawBench: Can AI Agents Complete Everyday Online Tasks? Paper • 2604.08523 • Published Apr 9 • 263
pinned Running Agents ClawBench Leaderboard 🦀 Live leaderboard for the ClawBench web-agent benchmark