Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning Paper • 2606.10968 • Published 3 days ago • 41
Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning Paper • 2606.10968 • Published 3 days ago • 41 • 3
Flow-DPPO: GenEval2 Collection Flow-DPPO-trained LoRA adapters (single- and multi-reward) for SD3.5 and FLUX.2-klein-9B optimized on GenEval2. • 5 items • Updated 1 day ago
Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning Paper • 2606.10968 • Published 3 days ago • 41