CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence Paper • 2605.12882 • Published 4 days ago • 75
Learning, Fast and Slow: Towards LLMs That Adapt Continually Paper • 2605.12484 • Published 5 days ago • 16
Discovering Reinforcement Learning Interfaces with Large Language Models Paper • 2605.03408 • Published 12 days ago • 3
Leveraging Verifier-Based Reinforcement Learning in Image Editing Paper • 2604.27505 • Published 17 days ago • 57
COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling Paper • 2604.20720 • Published 25 days ago • 2
lihaoxin2020/qwen3-4B-refiner-3201-rl-balanced-step50 Text Generation • 196k • Updated Apr 12 • 4 • 1
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 324
DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models Paper • 2603.26164 • Published Mar 27 • 364
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published Mar 20 • 350
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models Paper • 2603.25716 • Published Mar 26 • 156