5 31 8

Penghui Qi

QPHutu

QPHutu

AI & ML interests

None yet

Recent Activity

authored a paper 1 day ago

Rethinking the Divergence Regularization in LLM RL

upvoted a paper 3 days ago

Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models

upvoted a paper 3 days ago

Rethinking the Divergence Regularization in LLM RL

View all activity

Organizations

authored a paper 1 day ago

Rethinking the Divergence Regularization in LLM RL

Paper • 2606.09821 • Published 5 days ago • 32

upvoted 2 papers 3 days ago

Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models

Paper • 2606.11025 • Published 4 days ago • 40

Rethinking the Divergence Regularization in LLM RL

Paper • 2606.09821 • Published 5 days ago • 32

upvoted a paper 4 months ago

Experiential Reinforcement Learning

Paper • 2602.13949 • Published Feb 15 • 75

authored a paper 4 months ago

Rethinking the Trust Region in LLM Reinforcement Learning

Paper • 2602.04879 • Published Feb 4 • 37

upvoted a paper 4 months ago

Rethinking the Trust Region in LLM Reinforcement Learning

Paper • 2602.04879 • Published Feb 4 • 37

submitted a paper to Daily Papers 4 months ago

Rethinking the Trust Region in LLM Reinforcement Learning

Paper • 2602.04879 • Published Feb 4 • 37

authored a paper 4 months ago

Revisiting Parameter Server in LLM Post-Training

Paper • 2601.19362 • Published Jan 27 • 8

upvoted a paper 5 months ago

Revisiting Parameter Server in LLM Post-Training

Paper • 2601.19362 • Published Jan 27 • 8

liked 2 datasets 7 months ago

LLM360/guru-RL-92k

Viewer • Updated Aug 20, 2025 • 91.9k • 3.18k • 46

zwhe99/DeepMath-103K

Viewer • Updated May 29, 2025 • 103k • 7.18k • 365

updated a dataset 7 months ago

sail/Sanity-Test-R1D-1.5B

Viewer • Updated Nov 15, 2025 • 1.52k • 70 • 7

liked a dataset 7 months ago

sail/Sanity-Test-R1D-1.5B

Viewer • Updated Nov 15, 2025 • 1.52k • 70 • 7

updated a collection 7 months ago

Precision-RL

Collection

Defeating the Training-Inference Mismatch via FP16 • 2 items • Updated Nov 14, 2025

published a dataset 7 months ago

sail/Sanity-Test-R1D-1.5B

Viewer • Updated Nov 15, 2025 • 1.52k • 70 • 7

updated a collection 7 months ago

Precision-RL

Collection

Defeating the Training-Inference Mismatch via FP16 • 2 items • Updated Nov 14, 2025

liked a model 7 months ago

zz1358m/SofT-GRPO-master

Updated Nov 13, 2025 • 8

upvoted a paper 7 months ago

Diffusion Language Models are Super Data Learners

Paper • 2511.03276 • Published Nov 5, 2025 • 132

authored a paper 7 months ago

Defeating the Training-Inference Mismatch via FP16

Paper • 2510.26788 • Published Oct 30, 2025 • 32

upvoted a paper 7 months ago

Defeating the Training-Inference Mismatch via FP16

Paper • 2510.26788 • Published Oct 30, 2025 • 32

Penghui Qi

AI & ML interests

Recent Activity

Organizations

QPHutu's activity