view article Article Build an Agent That Thinks Like a Data Scientist: How We Hit #1 on DABStep with Reusable Tool Generation 12 days ago • 13
FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents Paper • 2602.01566 • Published Feb 2 • 52
TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior Paper • 2512.20757 • Published Dec 23, 2025 • 18
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation Paper • 2601.09688 • Published Jan 14 • 127
Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning Paper • 2601.06943 • Published Jan 11 • 214
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 229
DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research Paper • 2511.19399 • Published Nov 24, 2025 • 63
How Far Are We from Genuinely Useful Deep Research Agents? Paper • 2512.01948 • Published Dec 1, 2025 • 57
Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs Paper • 2509.24107 • Published Sep 28, 2025 • 80
DRBench: A Realistic Benchmark for Enterprise Deep Research Paper • 2510.00172 • Published Sep 30, 2025 • 1
PokeeResearch: Effective Deep Research via Reinforcement Learning from AI Feedback and Robust Reasoning Scaffold Paper • 2510.15862 • Published Oct 17, 2025 • 10
Enterprise Deep Research: Steerable Multi-Agent Deep Research for Enterprise Analytics Paper • 2510.17797 • Published Oct 20, 2025 • 11
Towards Personalized Deep Research: Benchmarks and Evaluations Paper • 2509.25106 • Published Sep 29, 2025 • 30
DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research Paper • 2505.19253 • Published May 25, 2025 • 34
InteractComp: Evaluating Search Agents With Ambiguous Queries Paper • 2510.24668 • Published Oct 28, 2025 • 99