Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
MercedeSnape 's Collections
agentic RL
Technical Report
world model
sandbox
Benchmark
ViT
Problem Definition
future
self-evolving
LLM reasoning
reasoning evaluation
mm thinking
agent reasoning
agent training
agent env
mas
model paradigm
Memory
RAG
Tokenization
pretrain
MoE
KG
survey

agentic RL

updated 3 days ago
Upvote
-

  • Scaling Agent Learning via Experience Synthesis

    Paper • 2511.03773 • Published Nov 5, 2025 • 83

    Note for online RL training “提炼为经验模型”


  • ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

    Paper • 2511.21689 • Published Nov 26, 2025 • 126

  • GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

    Paper • 2601.05242 • Published Jan 8 • 230

  • Reinforcement Learning for Self-Improving Agent with Skill Library

    Paper • 2512.17102 • Published Dec 18, 2025 • 42

  • ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas

    Paper • 2601.21558 • Published Jan 29 • 60

  • DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning

    Paper • 2511.22570 • Published Nov 27, 2025 • 93
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs