Post
63
🎰 Casino Benchmark: Dataset + Space
nyuuzyou/casino-benchmark
nyuuzyou/casino-benchmark
14 models faced 1,400 simulations of heads-up Blackjack and European Roulette. Shared seeds locked identical cards and spins for each.
Key Stats:
- 14 models benchmarked
- 59,483 rows
- 35 MB compressed Parquet
- 35,000 scored decisions
- Full prompts, JSON responses, reasoning traces, latency
- Bankroll tracking from $1,000 start per run
Live leaderboard tracks bets, hits, stands, and risk management.
Gemini 3 Flash leads at +$3,396. Claude 4.5 Haiku at -$7,788.
Traces in the dataset. Leaderboard in the space.
nyuuzyou/casino-benchmark
nyuuzyou/casino-benchmark
14 models faced 1,400 simulations of heads-up Blackjack and European Roulette. Shared seeds locked identical cards and spins for each.
Key Stats:
- 14 models benchmarked
- 59,483 rows
- 35 MB compressed Parquet
- 35,000 scored decisions
- Full prompts, JSON responses, reasoning traces, latency
- Bankroll tracking from $1,000 start per run
Live leaderboard tracks bets, hits, stands, and risk management.
Gemini 3 Flash leads at +$3,396. Claude 4.5 Haiku at -$7,788.
Traces in the dataset. Leaderboard in the space.