improve: 20 tasks, richer keywords, enhanced reward/grader, bigram matching, compelling README b83c8ad hellinferno commited on 27 days ago
fix: add run_episode wrapper, use .2f score format, update test for strict bounds 11aa990 hellinferno commited on 27 days ago
fix: clamp scores to strict (0,1) exclusive — validator requires no 0.0 or 1.0 2c28868 hellinferno commited on 27 days ago
fix: align score formatting decimals with strict hackathon template spec 52710e7 hellinferno commited on 27 days ago
fix: bulletproof inference.py with Docker + fallback connection methods 2a92b3a hellinferno commited on 27 days ago
fix: correct inference log format, align openenv.yaml task IDs, harden Dockerfile 852b5ea hellinferno Claude Sonnet 4.6 commited on 27 days ago