ELF-B reproduction checkpoints (PyTorch Lightning, OpenWebText)

Trained checkpoints from an unofficial PyTorch reproduction of ELF: Embedded Language Flows (Hu et al., 2026). Code, training/eval scripts, and reproduction artifacts live at https://github.com/Ugness/ELF-pytorch.

Results for ELF are not directly comparable with baselines (MDLM, Duo, FLM, ...) due to tokenization and preprocessing differences.

Files

Path	Size	Role
`last.ckpt`	1.4 GB	Final EMA-bearing checkpoint (== `checkpoint_epoch05_step00228204.ckpt`). This is the one used for the headline 1000-sample eval.
`checkpoints/checkpoint_epoch00_step00038034.ckpt` … `checkpoint_epoch05_step00228204.ckpt`	6 × 1.4 GB	Per-epoch checkpoints for reproducing the per-epoch trajectory. Optional.
`reproduction/config.yml`	—	Resolved training-config snapshot from the actual run.
`reproduction/eval1000/{all_generated,metrics}.jsonl`	—	1000 generated samples + final Gen. PPL/entropy.
`reproduction/per_epoch/epoch_00{1..6}.jsonl + metrics.jsonl`	—	256 sanity samples per epoch + per-epoch metrics.

Headline numbers (32-step SDE, γ=1.5, SC-CFG=3, ELF-B, OpenWebText)

Metric	Paper (TPU v5p-64)	This reproduction (8× B200 DDP, Lightning)
Gen. PPL ↓	24.1	25.61
Entropy ↑	5.15	5.20

Per-epoch trajectory

Epoch	Step	Gen. PPL	Entropy
1	38 034	2.73¹	0.70¹
2	76 068	37.11	5.17
3	114 102	28.63	5.21
4	152 136	25.00	5.16
5	190 170	25.58	5.19
6	228 204	26.11	5.21

¹ Epoch 1 is degenerate (entropy ≈ 0.7); the run only begins producing fluent text from epoch 2 onward.

Quickstart

pip install huggingface_hub
# Final EMA checkpoint only (recommended)
huggingface-cli download Ugness/elf-torch last.ckpt --local-dir ./elf-b/

# Then, from the code repo (https://github.com/Ugness/ELF-pytorch):
cd pytorch_lightning/
torchrun --nproc_per_node=8 --master_port=29510 eval_lightning.py \
    --config configs/training_configs/train_owt_ELF-B.yml \
    --checkpoint_path /path/to/elf-b/last.ckpt \
    --num_samples 1000
# Expected: Gen. PPL ≈ 25.6, sample entropy ≈ 5.20.

Reproduction details

Hardware: 8× NVIDIA B200 (sm_100), CUDA 12.8.
Framework: PyTorch Lightning, DDP.
Wall-clock: ~3 h/epoch × 6 epochs ≈ 18 h.
Precision: fp32.
Epochs: 6 (paper used 5); one extra to reach entropy ≈ 5.20.
All other math is identical to the official JAX/Flax implementation at https://github.com/lillian039/ELF.

License & citation

MIT, same as the code repo. Please cite the original paper:

@article{elf2026,
  title={ELF: Embedded Language Flows},
  author={Hu, Keya and Qiu, Linlu and Lu, Yiyang and Zhao, Hanhong and Li, Tianhong and Kim, Yoon and Andreas, Jacob and He, Kaiming},
  journal={arXiv preprint arXiv:2605.10938},
  year={2026}
}

This reproduction was heavily developed with Claude Code.

Downloads last month: -; Downloads are not tracked for this model. How to track

Dataset used to train Ugness/elf-torch

Paper for Ugness/elf-torch

ELF: Embedded Language Flows

Paper • 2605.10938 • Published 4 days ago • 11