timaeus/rl-lm-pythia160m-sentiment-pos-beta0-grpo-nostd-gs4-tp1-tk0-pt80000-steerDotIncL1c512s16-seed21 Updated 2 days ago • 1
DRIFT: Decoupled Rollouts and Importance-Weighted Fine-Tuning for Efficient Multi-Turn Optimization Paper • 2605.31455 • Published May 29 • 6
Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality? Paper • 2605.22109 • Published May 21 • 171
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published May 12 • 196
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence Paper • 2605.12882 • Published May 13 • 274