view article Article Build an Agent That Thinks Like a Data Scientist: How We Hit #1 on DABStep with Reusable Tool Generation 13 days ago • 14
FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents Paper • 2602.01566 • Published Feb 2 • 52
TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior Paper • 2512.20757 • Published Dec 23, 2025 • 18
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation Paper • 2601.09688 • Published Jan 14 • 127
Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning Paper • 2601.06943 • Published Jan 11 • 214
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 229
DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research Paper • 2511.19399 • Published Nov 24, 2025 • 63
How Far Are We from Genuinely Useful Deep Research Agents? Paper • 2512.01948 • Published Dec 1, 2025 • 57
Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs Paper • 2509.24107 • Published Sep 28, 2025 • 80
Running on CPU Upgrade Featured 3.06k The Smol Training Playbook 📚 3.06k The secrets to building world-class LLMs
DRBench: A Realistic Benchmark for Enterprise Deep Research Paper • 2510.00172 • Published Sep 30, 2025 • 1
PokeeResearch: Effective Deep Research via Reinforcement Learning from AI Feedback and Robust Reasoning Scaffold Paper • 2510.15862 • Published Oct 17, 2025 • 10