BeyondBench: Benchmark-Free Evaluation of Reasoning in Language Models Paper • 2509.24210 • Published Sep 29, 2025
JudgeBoard: Benchmarking and Enhancing Small Language Models for Reasoning Evaluation Paper • 2511.15958 • Published Nov 20, 2025 • 1
SPARK: Stepwise Process-Aware Rewards for Reference-Free Reinforcement Learning Paper • 2512.03244 • Published Dec 2, 2025 • 17
OPTAGENT: Optimizing Multi-Agent LLM Interactions Through Verbal Reinforcement Learning for Enhanced Reasoning Paper • 2510.18032 • Published Oct 20, 2025