Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published 13 days ago • 189
Not Every Rubric Teaches Equally: Policy-Aware Rubric Rewards for RLVR Paper • 2605.20164 • Published 6 days ago • 6
When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels Paper • 2605.06652 • Published 18 days ago • 5
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 Sentence Similarity • 0.1B • Updated Jan 28 • 49.1M • • 1.24k
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 503
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 326
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published Apr 3 • 629
Scaling Teams or Scaling Time? Memory Enabled Lifelong Learning in LLM Multi-Agent Systems Paper • 2604.03295 • Published Mar 27 • 10