Alignment - a Ambroser53 Collection

Ambroser53 's Collections

active learning

RL

Alignment

updated Oct 23, 2024

Understanding the performance gap between online and offline alignment algorithms

Paper • 2405.08448 • Published May 14, 2024 • 18
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment

Paper • 2405.19332 • Published May 29, 2024 • 22
Offline Regularised Reinforcement Learning for Large Language Models Alignment

Paper • 2405.19107 • Published May 29, 2024 • 15
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback

Paper • 2406.00888 • Published Jun 2, 2024 • 33
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms

Paper • 2406.02900 • Published Jun 5, 2024 • 13
BPO: Supercharging Online Preference Learning by Adhering to the Proximity of Behavior LLM

Paper • 2406.12168 • Published Jun 18, 2024 • 7
Deep Bayesian Active Learning for Preference Modeling in Large Language Models

Paper • 2406.10023 • Published Jun 14, 2024 • 2
Bootstrapping Language Models with DPO Implicit Rewards

Paper • 2406.09760 • Published Jun 14, 2024 • 41
Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" Cycle

Paper • 2407.13833 • Published Jul 18, 2024 • 12
Baichuan Alignment Technical Report

Paper • 2410.14940 • Published Oct 19, 2024 • 51