Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization Paper • 2602.23008 • Published 4 days ago • 33
NbAiLabArchive/whisper-large-v2-nob Automatic Speech Recognition • 2B • Updated Sep 13, 2023 • 10 • 13
Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key Paper • 2501.09695 • Published Jan 16, 2025 • 1
Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs Paper • 2505.12929 • Published May 19, 2025 • 3
Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs Paper • 2505.12929 • Published May 19, 2025 • 3
Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key Paper • 2501.09695 • Published Jan 16, 2025 • 1