Instructions to use andthattoo/etpi-phase1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use andthattoo/etpi-phase1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="andthattoo/etpi-phase1")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("andthattoo/etpi-phase1")
model = AutoModelForMultimodalLM.from_pretrained("andthattoo/etpi-phase1", device_map="auto")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use andthattoo/etpi-phase1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "andthattoo/etpi-phase1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "andthattoo/etpi-phase1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/andthattoo/etpi-phase1

SGLang

How to use andthattoo/etpi-phase1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "andthattoo/etpi-phase1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "andthattoo/etpi-phase1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "andthattoo/etpi-phase1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "andthattoo/etpi-phase1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use andthattoo/etpi-phase1 with Docker Model Runner:
```
docker model run hf.co/andthattoo/etpi-phase1
```

etpi-phase1 — Structured-CoT SFT seed for Qwen 3.6 27B

etpi-phase1 is Qwen 3.6 27B with one round of supervised fine-tuning on grammar-constrained, structured-CoT coding-agent trajectories. The adapter (LoRA rank 128, target_modules=all-linear) has been merged into the base weights — this is a normal HuggingFace model checkpoint with no PEFT runtime dependency.

This is phase 1 of a multi-phase training pipeline. Phase 2 is RLVR (GRPO) against R2E-Gym, starting from this checkpoint.

What's different about this model

The base Qwen 3.6 27B reasons in long free-form chains-of-thought (hundreds-to-thousands of tokens per turn). This SFT seed teaches the model a compact 3-slot IR inside <think>...</think>:

<think>
STATE: <one short line: current workspace state>
ACTION: <one short line: what to do next and why>
EXPECT: <one short line: what the observation should look like>
</think>
<tool_call>...</tool_call>   # or whatever the harness expects

Typical thinking-token spend per turn drops from ~1000+ → ~50-100 while preserving multi-turn coherence and tool-use ability.

Intended use

Drive a multi-turn coding agent — e.g., R2E-Gym / SWE-bench / Terminal-Bench style sandboxes — where the model:

Reads a task instruction (GitHub issue, terminal prompt, etc.)
Issues tool calls in a loop (bash, file editor, search)
Observes results and iterates
Submits when verified

The IR structure compresses the thinking phase; tool calls and answers are left unconstrained.

Training details


Base model	`Qwen/Qwen3.6-27B`
Training data	`andthattoo/etpi-sft` (318 verified-successful multi-turn trajectories)
Method	LoRA r=128, α=256, dropout 0.05, target_modules=all-linear, merged at end of training
Optimizer	AdamW (8-bit), lr 2e-4, cosine schedule, warmup 0.03
Epochs	2
Batch	per-device 1, grad accum 8 (effective batch 8)
Sequence length	8192
Loss masking	assistant tokens only (observations and user messages masked)
Other	gradient checkpointing on, bf16, Liger fused CE loss
Hardware	1× H100 80GB (~40 min)

Final metrics:

train_loss (final): 0.096
mean_token_accuracy: 95.87%

Data provenance

Trajectories were generated by running grammar-constrained Qwen 3.6 27B as a multi-turn agent against R2E-Gym-Lite (real GitHub issue tasks). The grammar enforced the IR structure inside <think> while leaving tool calls free. Each task was rolled out 4 times (K=4 best-of-N). Only reward=1.0 trajectories (passed the R2E unit-test verifier) were kept; among those, the shortest trajectory per task was selected as the SFT target — encoding a brevity-given-correctness preference directly into the data.

See andthattoo/etpi-sft for the full dataset including system prompt, task instructions, and per-turn messages.

Limitations

Small training set (318 examples). SFT-seed scale, not full SFT. Expected to generalize the IR format rather than acquire new capabilities. The phase-2 RL run is where capability climbs.
Trained on one task distribution (R2E-Gym, Python SWE-style issues). Performance on other languages or task types is untested.
R2E paper authors use the Terminus 2 scaffold with an 80k-token- per-turn budget to report 77.2% on SWE-bench Verified for the base Qwen 3.6 27B. This model is intentionally trained under a tighter scaffold (bash-only loop, IR-constrained thinking) with a different efficiency objective — direct numerical comparison is not apples to apples.
Phase 1 only. No RL has been applied. The expected pass@1 lift on R2E-Gym from SFT alone is modest; the real lift comes from phase 2.

Recommended inference

The model has internalized the IR format. You can drive it with or without grammar enforcement at inference time:

from transformers import AutoModelForCausalLM, AutoTokenizer

m = AutoModelForCausalLM.from_pretrained("andthattoo/etpi-phase1", torch_dtype="bfloat16", device_map="auto")
t = AutoTokenizer.from_pretrained("andthattoo/etpi-phase1")

messages = [
    {"role": "system", "content": "You are a software-engineering agent. ..."},
    {"role": "user", "content": "<github-issue text>"},
]
prompt = t.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
ids = t(prompt, return_tensors="pt").to(m.device)
out = m.generate(**ids, max_new_tokens=512, temperature=0.0)
print(t.decode(out[0][ids["input_ids"].shape[1]:], skip_special_tokens=True))

For maximum compression-safety in production, apply a GBNF grammar that enforces the IR shape on the <think> block.

License

Apache 2.0 (matches base Qwen 3.6 27B license).

Acknowledgements

Base model: Qwen team
Training environment: R2E-Gym
Inference: SGLang
Tooling: TRL, PEFT, Liger Kernel, bitsandbytes

Downloads last month: 5

Safetensors

Model size

27B params

Tensor type

BF16

Model tree for andthattoo/etpi-phase1

Base model

Qwen/Qwen3.6-27B

Finetuned

(327)

this model

andthattoo
/

etpi-phase1