Instructions to use andthattoo/etpi-phase1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use andthattoo/etpi-phase1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="andthattoo/etpi-phase1") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("andthattoo/etpi-phase1") model = AutoModelForImageTextToText.from_pretrained("andthattoo/etpi-phase1") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use andthattoo/etpi-phase1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "andthattoo/etpi-phase1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "andthattoo/etpi-phase1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/andthattoo/etpi-phase1
- SGLang
How to use andthattoo/etpi-phase1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "andthattoo/etpi-phase1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "andthattoo/etpi-phase1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "andthattoo/etpi-phase1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "andthattoo/etpi-phase1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use andthattoo/etpi-phase1 with Docker Model Runner:
docker model run hf.co/andthattoo/etpi-phase1
etpi-phase1 — Structured-CoT SFT seed for Qwen 3.6 27B
etpi-phase1 is Qwen 3.6 27B with one round of supervised fine-tuning
on grammar-constrained, structured-CoT coding-agent trajectories. The
adapter (LoRA rank 128, target_modules=all-linear) has been merged into
the base weights — this is a normal HuggingFace model checkpoint with
no PEFT runtime dependency.
This is phase 1 of a multi-phase training pipeline. Phase 2 is RLVR (GRPO) against R2E-Gym, starting from this checkpoint.
What's different about this model
The base Qwen 3.6 27B reasons in long free-form chains-of-thought
(hundreds-to-thousands of tokens per turn). This SFT seed teaches the
model a compact 3-slot IR inside <think>...</think>:
<think>
STATE: <one short line: current workspace state>
ACTION: <one short line: what to do next and why>
EXPECT: <one short line: what the observation should look like>
</think>
<tool_call>...</tool_call> # or whatever the harness expects
Typical thinking-token spend per turn drops from ~1000+ → ~50-100 while preserving multi-turn coherence and tool-use ability.
Intended use
Drive a multi-turn coding agent — e.g., R2E-Gym / SWE-bench / Terminal-Bench style sandboxes — where the model:
- Reads a task instruction (GitHub issue, terminal prompt, etc.)
- Issues tool calls in a loop (bash, file editor, search)
- Observes results and iterates
- Submits when verified
The IR structure compresses the thinking phase; tool calls and answers are left unconstrained.
Training details
| Base model | Qwen/Qwen3.6-27B |
| Training data | andthattoo/etpi-sft (318 verified-successful multi-turn trajectories) |
| Method | LoRA r=128, α=256, dropout 0.05, target_modules=all-linear, merged at end of training |
| Optimizer | AdamW (8-bit), lr 2e-4, cosine schedule, warmup 0.03 |
| Epochs | 2 |
| Batch | per-device 1, grad accum 8 (effective batch 8) |
| Sequence length | 8192 |
| Loss masking | assistant tokens only (observations and user messages masked) |
| Other | gradient checkpointing on, bf16, Liger fused CE loss |
| Hardware | 1× H100 80GB (~40 min) |
Final metrics:
- train_loss (final): 0.096
- mean_token_accuracy: 95.87%
Data provenance
Trajectories were generated by running grammar-constrained
Qwen 3.6 27B as a multi-turn agent against R2E-Gym-Lite (real GitHub
issue tasks). The grammar enforced the IR structure inside <think>
while leaving tool calls free. Each task was rolled out 4 times (K=4
best-of-N). Only reward=1.0 trajectories (passed the R2E unit-test
verifier) were kept; among those, the shortest trajectory per task
was selected as the SFT target — encoding a brevity-given-correctness
preference directly into the data.
See andthattoo/etpi-sft
for the full dataset including system prompt, task instructions, and
per-turn messages.
Limitations
- Small training set (318 examples). SFT-seed scale, not full SFT. Expected to generalize the IR format rather than acquire new capabilities. The phase-2 RL run is where capability climbs.
- Trained on one task distribution (R2E-Gym, Python SWE-style issues). Performance on other languages or task types is untested.
- R2E paper authors use the Terminus 2 scaffold with an 80k-token- per-turn budget to report 77.2% on SWE-bench Verified for the base Qwen 3.6 27B. This model is intentionally trained under a tighter scaffold (bash-only loop, IR-constrained thinking) with a different efficiency objective — direct numerical comparison is not apples to apples.
- Phase 1 only. No RL has been applied. The expected pass@1 lift on R2E-Gym from SFT alone is modest; the real lift comes from phase 2.
Recommended inference
The model has internalized the IR format. You can drive it with or without grammar enforcement at inference time:
from transformers import AutoModelForCausalLM, AutoTokenizer
m = AutoModelForCausalLM.from_pretrained("andthattoo/etpi-phase1", torch_dtype="bfloat16", device_map="auto")
t = AutoTokenizer.from_pretrained("andthattoo/etpi-phase1")
messages = [
{"role": "system", "content": "You are a software-engineering agent. ..."},
{"role": "user", "content": "<github-issue text>"},
]
prompt = t.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
ids = t(prompt, return_tensors="pt").to(m.device)
out = m.generate(**ids, max_new_tokens=512, temperature=0.0)
print(t.decode(out[0][ids["input_ids"].shape[1]:], skip_special_tokens=True))
For maximum compression-safety in production, apply a GBNF grammar that
enforces the IR shape on the <think> block.
License
Apache 2.0 (matches base Qwen 3.6 27B license).
Acknowledgements
- Downloads last month
- 41
Model tree for andthattoo/etpi-phase1
Base model
Qwen/Qwen3.6-27B