Spurious Rewards Spurious Rewards: Rethinking Training Signals in RLVR stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step50 Text Generation • 8B • Updated Jun 13, 2025 • 1 stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step100 Text Generation • 8B • Updated Jun 13, 2025 • 3 stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step150 Text Generation • 8B • Updated Jun 13, 2025 • 604 stellalisy/rethink_rlvr_reproduce-majority_vote-qwen2.5_math_7b-lr5e-7-kl0.00-step50 Text Generation • 8B • Updated Jun 13, 2025 • 2
stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step50 Text Generation • 8B • Updated Jun 13, 2025 • 1
stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step100 Text Generation • 8B • Updated Jun 13, 2025 • 3
stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step150 Text Generation • 8B • Updated Jun 13, 2025 • 604
stellalisy/rethink_rlvr_reproduce-majority_vote-qwen2.5_math_7b-lr5e-7-kl0.00-step50 Text Generation • 8B • Updated Jun 13, 2025 • 2
Personalized Reasoning stellalisy/personalized_math Preview • Updated Aug 26, 2025 • 4 • 1 stellalisy/personalized_aime Preview • Updated Aug 26, 2025 • 3 stellalisy/personalized_simpleqa Preview • Updated Aug 26, 2025 • 3 stellalisy/personalized_mascqa Preview • Updated Aug 26, 2025 • 2
Personalized Reasoning stellalisy/personalized_math Preview • Updated Aug 26, 2025 • 4 • 1 stellalisy/personalized_aime Preview • Updated Aug 26, 2025 • 3 stellalisy/personalized_simpleqa Preview • Updated Aug 26, 2025 • 3 stellalisy/personalized_mascqa Preview • Updated Aug 26, 2025 • 2
Spurious Rewards Spurious Rewards: Rethinking Training Signals in RLVR stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step50 Text Generation • 8B • Updated Jun 13, 2025 • 1 stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step100 Text Generation • 8B • Updated Jun 13, 2025 • 3 stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step150 Text Generation • 8B • Updated Jun 13, 2025 • 604 stellalisy/rethink_rlvr_reproduce-majority_vote-qwen2.5_math_7b-lr5e-7-kl0.00-step50 Text Generation • 8B • Updated Jun 13, 2025 • 2
stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step50 Text Generation • 8B • Updated Jun 13, 2025 • 1
stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step100 Text Generation • 8B • Updated Jun 13, 2025 • 3
stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step150 Text Generation • 8B • Updated Jun 13, 2025 • 604
stellalisy/rethink_rlvr_reproduce-majority_vote-qwen2.5_math_7b-lr5e-7-kl0.00-step50 Text Generation • 8B • Updated Jun 13, 2025 • 2