Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios Paper • 2401.17167 • Published Jan 30, 2024 • 1
Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance Paper • 2410.12361 • Published Oct 16, 2024
Boosting Tool Use of Large Language Models via Iterative Reinforced Fine-Tuning Paper • 2501.09766 • Published Jan 15, 2025 • 1
Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs Paper • 2504.07866 • Published Apr 10, 2025 • 11
SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation Paper • 2410.15164 • Published Oct 19, 2024
ToolACE-DEV: Self-Improving Tool Learning via Decomposition and EVolution Paper • 2505.07512 • Published May 12, 2025
ToolACE-MT: Non-Autoregressive Generation for Agentic Multi-Turn Interaction Paper • 2508.12685 • Published Aug 18, 2025 • 1
Evolutionary Perspectives on the Evaluation of LLM-Based AI Agents: A Comprehensive Survey Paper • 2506.11102 • Published Jun 6, 2025
Quick on the Uptake: Eliciting Implicit Intents from Human Demonstrations for Personalized Mobile-Use Agents Paper • 2508.08645 • Published Aug 12, 2025 • 1
VeriOS: Query-Driven Proactive Human-Agent-GUI Interaction for Trustworthy OS Agents Paper • 2509.07553 • Published Sep 9, 2025 • 1
A Survey of LLM-based Deep Search Agents: Paradigm, Optimization, Evaluation, and Challenges Paper • 2508.05668 • Published Aug 3, 2025 • 1
ColorAgent: Building A Robust, Personalized, and Interactive OS Agent Paper • 2510.19386 • Published Oct 22, 2025 • 9
LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls Paper • 2511.09148 • Published Nov 12, 2025 • 18