CodeLeWM Transition Model Artifacts
This repository hosts CodeLeWM transition-model checkpoints and checkpoint manifests for code-edit and execution-world-model experiments. CodeLeWM is a research artifact for scoring and reranking candidate code states; it is not a code generator and this repository is not an inference endpoint.
What Is Hosted Here
| Artifact family | Repository path | Dataset surface | Claim posture |
|---|---|---|---|
| Early scaled code-edit transition run | checkpoints/codelewm-scaled-20260520-9699b53 |
abdelstark/codelewm-public-shard |
smoke/scaled diagnostic |
| Action-use margin run | checkpoints/codelewm-action-use-20260520-6650183 |
abdelstark/codelewm-public-shard |
negative action-use result |
| Action-use retrieval run | checkpoints/codelewm-action-use-retrieval-20260520-7895d18 |
abdelstark/codelewm-public-shard |
negative against no-action control |
| v0.2 action-swap/inverse-action run | checkpoints/codelewm-v0-2-action-swap-rerun-20260520-7c7cb0b |
abdelstark/codelewm-public-shard |
negative action-use and representation result |
| v0.8 execution checkpoints, seeds 42 and 1729 | checkpoints/codelewm-v0-8-short-execution-20260605-1b737e4-seed-{42,1729} |
abdelstark/codelewm-execution-pack |
mixed diagnostic execution evidence |
The final v0.9 seed-42 and seed-1729 execution runs are published in
abdelstark/codelewm-runs, not in this model repository. They are documented
by the CodeLeWM release cards and final public artifact index.
Dataset Information
CodeLeWM uses two public dataset surfaces:
| Dataset | Role |
|---|---|
abdelstark/codelewm-public-shard |
Historical public-safe Python code-edit transition shards used by the scaled/action-use runs. |
abdelstark/codelewm-execution-pack |
Current execution-substrate pack of 2,188 tokenized (code, input, output) records at revision v0.9.0-rc1. |
The execution pack records a deterministic sandbox policy, source provenance, split policy, checksums, and a claim boundary. It contains tokenized code, inputs, outputs, and metadata; training and scoring do not execute candidate code.
Intended Use
- Reproduce CodeLeWM checkpoint loading, retrieval, surprise, scorer-quality, and reranking diagnostics.
- Compare transition-model scores against no-action, shuffled-action, lexical, random, and LLM-order controls.
- Inspect checkpoint manifests and trust-gate metadata before using a checkpoint in local scoring.
Out Of Scope
- Generating code.
- Claiming broad coding improvement or live patch utility.
- Treating a green manifest, demo, or checkpoint load as a model-quality claim.
- Loading checkpoints without the CodeLeWM checkpoint trust gates and manifest verification.
Claim Boundary
The tested code-edit action-use interventions are negative. The v0.9/v1.0 release evidence supports a narrow HumanEval WS-D diagnostic reranking slice, while the aggregate downstream claim remains closed because MBPP-Plus is saturated against no-action and lexical controls. This repository therefore supports reproducible diagnostic research, not a general claim that CodeLeWM improves coding.
Verification
Download a hosted checkpoint family and verify its manifests:
hf download abdelstark/codelewm-transition-model \
--repo-type model \
--include 'checkpoints/codelewm-v0-8-short-execution-20260605-1b737e4-seed-42/**' \
--local-dir .artifacts/hf-download/codelewm-transition-model
uv run codelewm manifest verify \
--manifest .artifacts/hf-download/codelewm-transition-model/checkpoints/codelewm-v0-8-short-execution-20260605-1b737e4-seed-42/manifest.json \
--json
uv run codelewm secret-scan \
.artifacts/hf-download/codelewm-transition-model/checkpoints/codelewm-v0-8-short-execution-20260605-1b737e4-seed-42 \
--json
Expected result: manifest verification returns ok=true and the secret scan
returns ok=true with zero findings.
Primary References
- Code repository:
https://github.com/AbdelStark/CodeLeWM - Execution dataset:
https://huggingface.co/datasets/abdelstark/codelewm-execution-pack - Historical code-edit shard:
https://huggingface.co/datasets/abdelstark/codelewm-public-shard - Run artifacts:
https://huggingface.co/datasets/abdelstark/codelewm-runs - Final artifact index:
docs/benchmark/PUBLIC_ARTIFACT_INDEX_2026-06-08.md