Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models Paper • 2306.04675 • Published Jun 7, 2023 • 1
Agentic Monte Carlo: Simulating Reinforcement Learning for Black-Box Agents Paper • 2606.05296 • Published 3 days ago • 8
Agentic Monte Carlo: Simulating Reinforcement Learning for Black-Box Agents Paper • 2606.05296 • Published 3 days ago • 8
Response Quality Assessment for Retrieval-Augmented Generation via Conditional Conformal Factuality Paper • 2506.20978 • Published Jun 26, 2025 • 1
RankJudge: A Multi-Turn LLM-as-a-Judge Synthetic Benchmark Generator Paper • 2605.21748 • Published 17 days ago • 16
Chartographer: Counterfactual Chart Generation for Evaluating Vision-Language Models Paper • 2605.27311 • Published 11 days ago • 3
Beyond Procedure: Substantive Fairness in Conformal Prediction Paper • 2602.16794 • Published Feb 18 • 1
On the Burden of Achieving Fairness in Conformal Prediction Paper • 2605.14260 • Published 22 days ago • 1
RankJudge: A Multi-Turn LLM-as-a-Judge Synthetic Benchmark Generator Paper • 2605.21748 • Published 17 days ago • 16
RankJudge: A Multi-Turn LLM-as-a-Judge Synthetic Benchmark Generator Paper • 2605.21748 • Published 17 days ago • 16
LLM Safety From Within: Detecting Harmful Content with Internal Representations Paper • 2604.18519 • Published Apr 20 • 26
Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning Paper • 2605.02913 • Published Apr 8 • 9