Quark-72M-Instruct
Quark-72M Instruct — compact autoregressive language model trained by ThingAI.
Model Details
| Parameter | Value |
|---|---|
| Parameters | 71.6M |
| Architecture | Decoder-only Transformer |
| Layers | 14 |
| Hidden size | 512 |
| Attention heads | 8 (GQA, 2 KV) |
| FFN | SwiGLU (1344) |
| Norm | RMSNorm |
| Position | RoPE |
| Vocab size | 65,538 |
| Context length | 2,048 |
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("ThingAI/Quark-72M-Instruct")
model = AutoModelForCausalLM.from_pretrained("ThingAI/Quark-72M-Instruct", trust_remote_code=True)
prompt = "<|user|>\nHow do I find files larger than 100MB?\n<|end|>\n<|assistant|>\n"
ids = tokenizer(prompt, return_tensors="pt").input_ids
out = model.generate_text(ids, max_new_tokens=200, temperature=0.2)
print(tokenizer.decode(out[0], skip_special_tokens=False))
Training
- Pre-training: 5B tokens on math, code, EN/IT text
- SFT: bash commands, code, conversations (ChatML template)
- Tokenizer: BPE byte-level, 65536 vocab
License
MIT
- Downloads last month
- -