OLMo-Coding/starcoder-python-instruct
Viewer • Updated • 1.26M • 7.22k • 12
CinnabarLM Python is a tiny, 4M-parameter code LLM trained for ~38 minutes on a T4 GPU (on Colab)! It's only 16 MB in size and now it's Llama-based!
Because it's a good idea to make tiny LLMs. Some people already did with MicroLM, Spark 4 5M and Tenete 8M, but not myself!
| Parameter | Value |
|---|---|
| Tokenizer | Llama 3's tokenizer (Tiktoken / BPE) |
| Vocabulary Size | 4096 tokens |
| Batch Size | 4 x 8 = 32 |
| Context Window | Maybe 2048 tokens |
hidden_size |
192 |
intermediate_size |
192 |
num_hidden_layers |
6 |
num_attention_heads |
6 |
max_position_embeddings |
2048 |
rms_norm_eps |
1e-5 |
initializer_range |
0.02 |
use_cache |
True |
tie_word_embeddings |
False |
rope_theta |
10000.0 |
| Hyperparameter | Value |
|---|---|
output_dir |
"./cinnabarlm-v2" |
max_steps |
10000 |
per_device_train_batch_size |
8 |
gradient_accumulation_steps |
4 |
learning_rate |
6e-4 |
weight_decay |
0.01 |
warmup_steps |
500 |
lr_scheduler_type |
"cosine" |
logging_steps |
100 |
save_steps |
2000 |
fp16 |
True |
save_total_limit |
2 |
prediction_loss_only |
True |
logging_first_step |
True |