view article Article Simplifying Alignment: From RLHF to Direct Preference Optimization (DPO) ariG23498 โข Jan 19, 2025 โข 53
AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents Paper โข 2602.06855 โข Published Feb 6 โข 83