Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF Paper • 2410.04612 • Published Oct 6, 2024
Retrieval-Enhanced Machine Learning: Synthesis and Opportunities Paper • 2407.12982 • Published Jul 17, 2024 • 6
REBEL: Reinforcement Learning via Regressing Relative Rewards Paper • 2404.16767 • Published Apr 25, 2024 • 2
Provable Reward-Agnostic Preference-Based Reinforcement Learning Paper • 2305.18505 • Published May 29, 2023
Compositional Semantic Parsing with Large Language Models Paper • 2209.15003 • Published Sep 29, 2022 • 1
PaRaDe: Passage Ranking using Demonstrations with Large Language Models Paper • 2310.14408 • Published Oct 22, 2023
Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-Per-Second Paper • 2306.07552 • Published Jun 13, 2023 • 3