Peng Wang's picture

Peng Wang

stillarrow

·

https://peter-peng-w.github.io/

AI & ML interests

None yet

Recent Activity

upvoted a paper 2 days ago

Learning to Repair Lean Proofs from Compiler Feedback

upvoted a paper 8 days ago

Experiential Reinforcement Learning

liked a dataset 12 days ago

derek-thomas/ScienceQA

View all activity

Organizations

None yet

upvoted a paper 2 days ago

Learning to Repair Lean Proofs from Compiler Feedback

Paper • 2602.02990 • Published 23 days ago • 29

upvoted a paper 8 days ago

Experiential Reinforcement Learning

Paper • 2602.13949 • Published 11 days ago • 67

upvoted a paper 25 days ago

Scaling Embeddings Outperforms Scaling Experts in Language Models

Paper • 2601.21204 • Published 28 days ago • 100

upvoted an article about 1 month ago

Article

Open Responses: What you need to know

+2

Jan 15

•

108

upvoted 3 papers about 1 month ago

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published Feb 20, 2025 • 108

Your Group-Relative Advantage Is Biased

Paper • 2601.08521 • Published Jan 13 • 155

PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning

Paper • 2508.21104 • Published Aug 28, 2025 • 37

upvoted a collection about 1 month ago

🧠 Reasoning datasets

Datasets with reasoning traces for math and code released by the community • 24 items • Updated May 19, 2025 • 183

upvoted an article 2 months ago

Article

From GRPO to DAPO and GSPO: What, Why, and How

Aug 9, 2025

•

85

upvoted an article 3 months ago

Article

Illustrating Reinforcement Learning from Human Feedback (RLHF)

+2

Dec 9, 2022

•

403

upvoted a paper 3 months ago

Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

Paper • 2511.06221 • Published Nov 9, 2025 • 132

upvoted 2 papers 5 months ago

ExGRPO: Learning to Reason from Experience

Paper • 2510.02245 • Published Oct 2, 2025 • 80

TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning

Paper • 2509.25760 • Published Sep 30, 2025 • 55

upvoted a collection 5 months ago

Qwen3-VL

37 items • Updated Dec 31, 2025 • 641

upvoted 2 papers 5 months ago

VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models

Paper • 2509.19803 • Published Sep 24, 2025 • 120

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Paper • 2508.08221 • Published Aug 11, 2025 • 50

upvoted a collection 7 months ago

FastCuRL

The collection for the Paper "Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient Training R1-like Reasoning Models" • 6 items • Updated May 29, 2025 • 3

upvoted a collection 8 months ago

"Physics of Language Models" series

7 items • Updated Dec 22, 2025 • 53

upvoted a paper 8 months ago

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 263

upvoted a collection 8 months ago

Tool-Star

Tool-Star is a reinforcement learning-based framework designed to empower LLMs to autonomously invoke multiple external tools during stepwise reasonin • 8 items • Updated Sep 2, 2025 • 5