Quark-72M-Instruct

Quark-72M Instruct — compact autoregressive language model trained by ThingAI.

Model Details

Parameter	Value
Parameters	71.6M
Architecture	Decoder-only Transformer
Layers	14
Hidden size	512
Attention heads	8 (GQA, 2 KV)
FFN	SwiGLU (1344)
Norm	RMSNorm
Position	RoPE
Vocab size	65,538
Context length	2,048

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("ThingAI/Quark-72M-Instruct")
model     = AutoModelForCausalLM.from_pretrained("ThingAI/Quark-72M-Instruct", trust_remote_code=True)

prompt = "<|user|>\nHow do I find files larger than 100MB?\n<|end|>\n<|assistant|>\n"
ids    = tokenizer(prompt, return_tensors="pt").input_ids
out    = model.generate_text(ids, max_new_tokens=200, temperature=0.2)
print(tokenizer.decode(out[0], skip_special_tokens=False))

Training

Pre-training: 5B tokens on math, code, EN/IT text
SFT: bash commands, code, conversations (ChatML template)
Tokenizer: BPE byte-level, 65536 vocab

License

MIT

Downloads last month: -

Safetensors

Model size

71.7M params

Tensor type

F32