FineVerify: Scaling Test-Time Compute with Fine-Grained Self-Verification for Agentic Search Paper • 2606.00660 • Published 21 days ago • 8
RankJudge: A Multi-Turn LLM-as-a-Judge Synthetic Benchmark Generator Paper • 2605.21748 • Published May 20 • 17
MLAIRE: Multilingual Language-Aware Information Retrieval Evaluation Protocal Paper • 2605.07249 • Published May 8 • 3
Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation Paper • 2605.11739 • Published May 13 • 59
Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenization Paper • 2605.10780 • Published May 12 • 33
openai/whisper-large-v3-turbo Automatic Speech Recognition • 0.8B • Updated Oct 4, 2024 • 7.85M • • 3.1k
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning Paper • 2605.06130 • Published May 7 • 115
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents Paper • 2605.05185 • Published May 6 • 105
From Context to Skills: Can Language Models Learn from Context Skillfully? Paper • 2604.27660 • Published May 3 • 170
Revisiting a Pain in the Neck: A Semantic Reasoning Benchmark for Language Models Paper • 2604.16593 • Published Apr 17 • 6
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 327