Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels Paper • 2510.06499 • Published Oct 7, 2025 • 33
Enterprise Deep Research: Steerable Multi-Agent Deep Research for Enterprise Analytics Paper • 2510.17797 • Published Oct 20, 2025 • 11
xRouter: Training Cost-Aware LLMs Orchestration System via Reinforcement Learning Paper • 2510.08439 • Published Oct 9, 2025 • 1
LoCoBench-Agent: An Interactive Benchmark for LLM Agents in Long-Context Software Engineering Paper • 2511.13998 • Published Nov 17, 2025 • 3
CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows? Paper • 2605.16679 • Published 7 days ago • 48
CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows? Paper • 2605.16679 • Published 7 days ago • 48
xLAM: A Family of Large Action Models to Empower AI Agent Systems Paper • 2409.03215 • Published Sep 5, 2024 • 6
ActionStudio: A Lightweight Framework for Data and Training of Large Action Models Paper • 2503.22673 • Published Mar 28, 2025 • 12
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay Paper • 2504.03601 • Published Apr 4, 2025 • 18
MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models Paper • 2507.12806 • Published Jul 17, 2025 • 21
UserBench: An Interactive Gym Environment for User-Centric Agents Paper • 2507.22034 • Published Jul 29, 2025 • 30
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering Paper • 2509.09614 • Published Sep 11, 2025 • 7
UserRL: Training Interactive User-Centric Agent via Reinforcement Learning Paper • 2509.19736 • Published Sep 24, 2025 • 12
Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels Paper • 2510.06499 • Published Oct 7, 2025 • 33