geodesic-research/nemotron_120b_warm_start_sft_200k_instruct Text Generation • 124B • Updated Apr 28 • 184 • 1
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-Base-BF16 Text Generation • 124B • Updated Mar 14 • 21.1k • 30
(Some) Emergent Misalignment from Reward Hacking in RL Collection Model checkpoints from the project "(Some) Natural Emergent Misalignment from Reward Hacking in Non-Production RL" • 228 items • Updated 17 days ago • 5