ELF-B reproduction checkpoints (PyTorch Lightning, OpenWebText)

Trained checkpoints from an unofficial PyTorch reproduction of ELF: Embedded Language Flows (Hu et al., 2026). Code, training/eval scripts, and reproduction artifacts live at https://github.com/Ugness/ELF-pytorch.

Results for ELF are not directly comparable with baselines (MDLM, Duo, FLM, ...) due to tokenization and preprocessing differences.

Files

Path Size Role
last.ckpt 1.4 GB Final EMA-bearing checkpoint (== checkpoint_epoch05_step00228204.ckpt). This is the one used for the headline 1000-sample eval.
checkpoints/checkpoint_epoch00_step00038034.ckptcheckpoint_epoch05_step00228204.ckpt 6 × 1.4 GB Per-epoch checkpoints for reproducing the per-epoch trajectory. Optional.
reproduction/config.yml Resolved training-config snapshot from the actual run.
reproduction/eval1000/{all_generated,metrics}.jsonl 1000 generated samples + final Gen. PPL/entropy.
reproduction/per_epoch/epoch_00{1..6}.jsonl + metrics.jsonl 256 sanity samples per epoch + per-epoch metrics.

Headline numbers (32-step SDE, γ=1.5, SC-CFG=3, ELF-B, OpenWebText)

Metric Paper (TPU v5p-64) This reproduction (8× B200 DDP, Lightning)
Gen. PPL ↓ 24.1 25.61
Entropy ↑ 5.15 5.20

Per-epoch trajectory

Epoch Step Gen. PPL Entropy
1 38 034 2.73¹ 0.70¹
2 76 068 37.11 5.17
3 114 102 28.63 5.21
4 152 136 25.00 5.16
5 190 170 25.58 5.19
6 228 204 26.11 5.21

¹ Epoch 1 is degenerate (entropy ≈ 0.7); the run only begins producing fluent text from epoch 2 onward.

Quickstart

pip install huggingface_hub
# Final EMA checkpoint only (recommended)
huggingface-cli download Ugness/elf-torch last.ckpt --local-dir ./elf-b/

# Then, from the code repo (https://github.com/Ugness/ELF-pytorch):
cd pytorch_lightning/
torchrun --nproc_per_node=8 --master_port=29510 eval_lightning.py \
    --config configs/training_configs/train_owt_ELF-B.yml \
    --checkpoint_path /path/to/elf-b/last.ckpt \
    --num_samples 1000
# Expected: Gen. PPL ≈ 25.6, sample entropy ≈ 5.20.

Reproduction details

  • Hardware: 8× NVIDIA B200 (sm_100), CUDA 12.8.
  • Framework: PyTorch Lightning, DDP.
  • Wall-clock: ~3 h/epoch × 6 epochs ≈ 18 h.
  • Precision: fp32.
  • Epochs: 6 (paper used 5); one extra to reach entropy ≈ 5.20.
  • All other math is identical to the official JAX/Flax implementation at https://github.com/lillian039/ELF.

License & citation

MIT, same as the code repo. Please cite the original paper:

@article{elf2026,
  title={ELF: Embedded Language Flows},
  author={Hu, Keya and Qiu, Linlu and Lu, Yiyang and Zhao, Hanhong and Li, Tianhong and Kim, Yoon and Andreas, Jacob and He, Kaiming},
  journal={arXiv preprint arXiv:2605.10938},
  year={2026}
}

This reproduction was heavily developed with Claude Code.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Ugness/elf-torch

Paper for Ugness/elf-torch