MIMIC: Melee Imitation Model for Input Cloning
Behavior-cloned Super Smash Bros. Melee bots trained on human Slippi replays. Eight character-specific ~20M-parameter transformers that take a 180-frame window of game state and output controller inputs (main stick, c-stick, shoulder, buttons) at 60 Hz. Each model plays over Slippi Online Direct Connect through Dolphin + libmelee.
- Repo: https://github.com/erickfm/MIMIC
- Training data: erickfm/melee-ranked-replays β ranked Slippi replays (master/diamond/platinum tier) per character.
- Base architecture: Shaw-relative-position causal transformer (d_model=512, 6 layers, 8 heads, seq_len=180). Bootstrapped from HAL (Eric Gu) and since diverged.
- Defining MIMIC changes over HAL: 7-class button head with a
distinct TRIG class for airdodge/wavedash (HAL's 5-class head can't
represent airdodge and thus can't wavedash); v2 shard alignment that
fixes a subtle post-frame-gamestate leak in the training targets
(see
research-notes-2026-04-11c); the digital-L-press fix indecode_and_press(research notes 2026-04-13) without which no 7-class BC bot wavedashes.
Current checkpoints (retrained on 2026-04-20 baseline)
Retrained on the post-schema-drop (13 numeric cols), new-transforms
(tanh_scale / linear_max / log_max for velocity / hitlag /
hitstun) basis. See research-notes-2026-04-20.md in the MIMIC repo
for methodology + results analysis.
| Character | Run | Train games | Val loss | Step |
|---|---|---|---|---|
| Fox | fox-20260420-baseline |
31,030 | 0.7144 | 32768 |
| Falco | falco-20260420-baseline |
20,882 | 0.7487 | 31392 |
| Marth | marth-20260420-baseline |
11,759 | 0.6664 | 31065 |
| Sheik | sheik-20260420-baseline |
51,751 | 0.6566 | 26160 |
| Captain Falcon | cptfalcon-20260420-baseline |
17,557 | 0.7368 | watchdog |
| Luigi | luigi-20260420-baseline |
2,290 | 0.7460 | watchdog |
Peach, Jigglypuff, and Ice Climbers remain on pre-2026-04-20 schemas:
peach-20260420-baseline(val 0.6322) was trained on the 22-col schema before the schema drop β loadable via its pickled config.puffandice_climbersmissed the 2026-04-20 retrain cycle due to a download-script bug; their existing HF checkpoints are on the old schema. These two are incompatible with the current 13-col inference code path. Will be retrained in a follow-on cycle.
Repo layout
MIMIC/
βββ README.md # this file
βββ fox/
β βββ model.pt # raw PyTorch checkpoint
β βββ config.json # ModelConfig (copied from ckpt["config"])
β βββ metadata.json # provenance (step, val metrics, notes)
β βββ mimic_norm.json # per-feature transforms + params
β βββ controller_combos.json # 7-class button combo spec
β βββ cat_maps.json
β βββ stick_clusters.json
β βββ norm_stats.json # per-column mean/std (z-score fallback)
βββ falco/ (same layout)
βββ marth/ (same layout)
βββ sheik/ (same layout)
βββ cptfalcon/ (same layout)
βββ luigi/ (same layout)
βββ puff/ (same layout)
βββ ice_climbers/(same layout)
βββ peach/ (same layout, pre-drop schema β retrain pending)
Each character directory is self-contained β the JSONs are the exact metadata used during training, copied verbatim from the data dir so any inference script can load them without touching the MIMIC repo.
Usage
git clone https://github.com/erickfm/MIMIC.git
cd MIMIC
bash setup.sh # installs Dolphin, deps, ISO
# Download all characters
python3 -c "
from huggingface_hub import snapshot_download
snapshot_download('erickfm/MIMIC', local_dir='./hf_checkpoints')
"
Run a character against a level-9 CPU:
python3 tools/play_vs_cpu.py \
--checkpoint hf_checkpoints/marth/model.pt \
--dolphin-path ./emulator/squashfs-root/usr/bin/dolphin-emu \
--iso-path ./melee.iso \
--data-dir hf_checkpoints/marth \
--character MARTH --cpu-character FOX --cpu-level 9 \
--stage FINAL_DESTINATION
Or play a bot over Slippi Online Direct Connect:
python3 tools/play_netplay.py \
--checkpoint hf_checkpoints/sheik/model.pt \
--dolphin-path ./emulator/squashfs-root/usr/bin/dolphin-emu \
--iso-path ./melee.iso \
--data-dir hf_checkpoints/sheik \
--character SHEIK \
--connect-code YOUR#123
The MIMIC repo also includes a Discord bot frontend
(tools/discord_bot.py) that queues direct-connect matches per user.
See docs/discord-bot-setup.md.
Architecture
Slippi frame βββΊ MimicFlatEncoder (Linear 184β512) βββΊ 512-d per-frame vector
β
180-frame window βββΊ + Shaw Relative-Position attention βββββ
β
6Γ Pre-Norm Causal Transformer Blocks (512-d, 8 heads, d_ff=2048, GELU, LN)
β
Autoregressive Output Heads (with detach)
β
ββββββββββββββββΌββββββββββββββββ¬βββββββββββββ
shoulder(3) c_stick(9) main_stick(37) buttons(7)
7-class button head
| Class | Meaning |
|---|---|
| 0 | A |
| 1 | B |
| 2 | Z |
| 3 | JUMP (X or Y) |
| 4 | TRIG (digital L or R) |
| 5 | A_TRIG (shield grab) |
| 6 | NONE |
HAL's original 5-class head (A / B / Jump / Z / None) has no TRIG class
and structurally can't execute airdodge, which means HAL-lineage bots
can't wavedash. MIMIC's 7-class encoding plus a fix for
decode_and_press (which was silently dropping the digital L press
until 2026-04-13) is what enables the wavedashing in the replays.
Input features (per frame, per player)
Numeric (13):
pos_x, pos_y, percent, stock, jumps_left,
speed_air_x_self, speed_ground_x_self,
speed_x_attack, speed_y_attack, speed_y_self,
hitlag_left, hitstun_left,
shield_strength
Flags (5):
on_ground, off_stage, facing, invulnerable, moonwalkwarning
Per-feature normalization is defined in each character's
mimic_norm.json. The active transforms are:
| transform | formula | used for |
|---|---|---|
normalize |
2(x-min)/(max-min) - 1 β [-1, +1] |
percent, stock, jumps_left, facing, invulnerable, on_ground |
standardize |
(x - mean) / std |
pos_x, pos_y |
invert_normalize |
2(max-x)/(max-min) - 1 |
shield_strength (so "shield broken" is +1) |
tanh_scale |
tanh(x / scale) |
5 velocities (scale=5 for self, scale=10 for attack) |
linear_max |
x / max |
hitlag_left (max=20) |
log_max |
log1p(clamp(x,0,max)) / log1p(max) |
hitstun_left (max=120) |
Plus categorical embeddings: stage(4d), 2Γ character(12d), 2Γ action(32d). Plus the previous-frame controller state as a 56-dim one-hot (37 stick + 9 c-stick + 7 button + 3 shoulder).
Total input per frame: 184 dimensions β projected to 512.
Earlier builds (pre-2026-04-20) used a 22-col numeric schema that
included invuln_left and 8 ECB corners. Those columns turned out to
be structurally zero for our .slp parse path β libmelee never
populates them β so they were dropped from the schema. See research
notes 2026-04-20 for the audit. Checkpoints trained pre-drop
(peach-20260420-baseline) still load via their own pickled config
but use the 202-dim projection path.
Training
- Model preset:
mimic(20M params) - Optimizer: AdamW, LR 3e-4, weight decay 0.01, no warmup
- LR schedule:
CosineAnnealingLRtoeta_min=1e-6 - Gradient clip: 1.0
- Dropout: 0.2
- Sequence length: 180 frames (~3 seconds)
- Batch size: 256 per-GPU Γ 2 RTX 5090s Γ grad-accum 1 = eff-batch 512
- Mixed precision: BF16 AMP with FP32 upcast for relpos attention (prevents BF16 overflow in the manual Q@Kα΅ + S_rel computation)
- Max samples: 16.78M (β 32,768 steps at eff-batch 512)
- Watchdog: patience=12 evals on val-plateau β some chars finish early
- Reaction delay: 0. v2 shards have
target[i] = buttons[i+1], sord=0matches inference β do NOT use--reaction-delay 1or--controller-offsetwith v2 shards. --self-inputsis required even on v2 shards. Runs without it drop the controller-history input entirely and land at val loss ~2.3.
Typical wall-clock per char on 2ΓRTX 5090: 10-15 min download/extract
- 20 min parallel
norm_statsbootstrap + 45-120 min sharding (depending on char, cptfalcon and sheik are the longest) + ~50 min training = 2-4 hours.
Known limitations
- Character-locked. Each model only plays the character it was trained on. No matchup generalization. Multi-character training with a character embedding is a natural next step but not done.
- Small-dataset overfitting on Luigi / Ice Climbers. Luigi has
~2K training games; IC around 5K. Their
_bestloss.ptis early-stopped β either by the patience=12 watchdog during this cycle or by inspection in prior cycles. Play quality varies. - Edge guarding and recovery weaknesses. Bots don't consistently go for off-stage edge guards or execute high-skill recovery mixups. The training data has these in it, but BC bots under-sample long-tail strategic decisions.
- No Matchmaking / Ranked. The Discord bot only joins explicit Direct Connect lobbies. Do NOT adapt it for Slippi Online Unranked or Ranked β libmelee's README explicitly forbids bots on those ladders, and Slippi has not yet opened a "bot account" opt-in system.
Acknowledgments
- Eric Gu for HAL, the reference implementation MIMIC is based on. HAL's architecture, tokenization, and training pipeline are the foundation.
- Vlad Firoiu and collaborators for libmelee, the Python interface to Dolphin + Slippi.
- Project Slippi for the Slippi Dolphin fork, replay format, and Direct Connect rollback netplay. https://slippi.gg
License
MIT β see the MIMIC repo's LICENSE file.