On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models
Paper • 2512.07783 • Published • 39
This repository contains the op11-14 CPT checkpoints and corresponding local RL outputs used by scripts/composition/op-difficulty-10B/script_cpt_rl/id2-10_0.2easy_0.3medium_0.5hard_cpt11-14.
For pretraining, only cpt0.2-uniform_0.8-11-14_plus is included. For RL, only final actor/huggingface checkpoints found locally are uploaded.
| Path | Checkpoint | Used by nominal step / CPT epoch |
|---|---|---|
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-387 |
checkpoint-387 | 50step/0.2 |
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-774 |
checkpoint-774 | 100step/0.2 |
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1161 |
checkpoint-1161 | 50step/0.5 |
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1548 |
checkpoint-1548 | 200step/0.2 |
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1935 |
checkpoint-1935 | 100step/0.5 |
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-2322 |
checkpoint-2322 | 300step/0.2 |
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-3096 |
checkpoint-3096 | 100step/0.8, 400step/0.2 |
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-3870 |
checkpoint-3870 | 500step/0.2 |
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-4644 |
checkpoint-4644 | 600step/0.2 |
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-6192 |
checkpoint-6192 | 300step/0.5 |
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-6579 |
checkpoint-6579 | 800step/0.2 |
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-7740 |
checkpoint-7740 | 954step/0.2 |
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-8127 |
checkpoint-8127 | 400step/0.5 |
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-9675 |
checkpoint-9675 | 300step/0.8 |
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-10062 |
checkpoint-10062 | 500step/0.5 |
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-11997 |
checkpoint-11997 | 600step/0.5 |
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-12771 |
checkpoint-12771 | 400step/0.8 |
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-15867 |
checkpoint-15867 | 800step/0.5 |
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-16254 |
checkpoint-16254 | 500step/0.8 |
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-18963 |
checkpoint-18963 | 954step/0.5 |
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-19350 |
checkpoint-19350 | 600step/0.8 |
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-25542 |
checkpoint-25542 | 800step/0.8 |
| Path | Nominal step | CPT epoch | Source CPT checkpoint | Uploaded checkpoint |
|---|---|---|---|---|
id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-50step-0.8RL |
50 | 0.2 | checkpoint-387 | global_step_40 |
id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-50step-0.5RL |
50 | 0.5 | checkpoint-1161 | global_step_25 |
id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-50step-0.2RL |
50 | 0.8 | checkpoint-1548 | global_step_9 |
id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-100step-0.2RL |
100 | 0.8 | checkpoint-3096 | global_step_19 |
id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-100step-0.5RL |
100 | 0.5 | checkpoint-1935 | global_step_50 |
id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-100step-0.8RL |
100 | 0.2 | checkpoint-774 | global_step_80 |
id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-200step-0.2RL |
200 | 0.8 | checkpoint-6579 | global_step_39 |
id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-200step-0.5RL |
200 | 0.5 | checkpoint-3870 | global_step_100 |
id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-200step-0.8RL |
200 | 0.2 | checkpoint-1548 | global_step_160 |
id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-300step-0.2RL |
300 | 0.8 | checkpoint-9675 | global_step_59 |
id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-300step-0.5RL |
300 | 0.5 | checkpoint-6192 | global_step_150 |
id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-300step-0.8RL |
300 | 0.2 | checkpoint-2322 | global_step_240 |
from transformers import AutoModelForCausalLM, AutoTokenizer
repo_id = "Interplay-LM-Reasoning/extrapolation_midtrain"
subdir = "id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-25542"
tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder=subdir)
model = AutoModelForCausalLM.from_pretrained(repo_id, subfolder=subdir)
@misc{zhang2025interplaypretrainingmidtrainingrl,
title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
author={Charlie Zhang and Graham Neubig and Xiang Yue},
year={2025},
eprint={2512.07783},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.07783},
}