Models from the paper "LaSeR: Reinforcement Learning with Last-Token Self-Rewarding"
Wenkai Yang
Keven16
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
about 17 hours ago
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation
submitted
a paper
about 17 hours ago
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation
Organizations
None yet