OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification Paper • 2606.01476 • Published 4 days ago • 7
OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification Paper • 2606.01476 • Published 4 days ago • 7
Conditional Hypothesis Generation for LLM-Based Text Analysis with Researcher-Specified Covariates Paper • 2606.03029 • Published 2 days ago • 4
Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning Paper • 2605.22642 • Published 14 days ago • 35
Synthetic Sandbox for Training Machine Learning Engineering Agents Paper • 2604.04872 • Published Apr 6 • 14
Synthetic Sandbox for Training Machine Learning Engineering Agents Paper • 2604.04872 • Published Apr 6 • 14
On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models Paper • 2512.07783 • Published Dec 8, 2025 • 40
First Frame Is the Place to Go for Video Content Customization Paper • 2511.15700 • Published Nov 19, 2025 • 54
Self-Rewarding Vision-Language Model via Reasoning Decomposition Paper • 2508.19652 • Published Aug 27, 2025 • 85