opt-babylm1_seed-1024_5e-6
This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 2.9422
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 32
- eval_batch_size: 64
- seed: 1024
- gradient_accumulation_steps: 8
- total_train_batch_size: 256
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 20.0
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 5.2297 | 0.4206 | 1000 | 5.1826 |
| 4.7637 | 0.8413 | 2000 | 4.7401 |
| 4.299 | 1.2616 | 3000 | 4.2874 |
| 3.9607 | 1.6823 | 4000 | 3.9489 |
| 3.6494 | 2.1026 | 5000 | 3.6517 |
| 3.4719 | 2.5233 | 6000 | 3.4667 |
| 3.3668 | 2.9439 | 7000 | 3.3572 |
| 3.2732 | 3.3643 | 8000 | 3.2821 |
| 3.2214 | 3.7849 | 9000 | 3.2332 |
| 3.1418 | 4.2053 | 10000 | 3.1875 |
| 3.1246 | 4.6259 | 11000 | 3.1543 |
| 3.0548 | 5.0463 | 12000 | 3.1296 |
| 3.051 | 5.4669 | 13000 | 3.1023 |
| 3.0457 | 5.8875 | 14000 | 3.0826 |
| 2.9936 | 6.3079 | 15000 | 3.0715 |
| 2.9888 | 6.7285 | 16000 | 3.0553 |
| 2.9411 | 7.1489 | 17000 | 3.0484 |
| 2.9517 | 7.5695 | 18000 | 3.0344 |
| 2.9439 | 7.9902 | 19000 | 3.0212 |
| 2.9073 | 8.4105 | 20000 | 3.0138 |
| 2.9174 | 8.8312 | 21000 | 3.0022 |
| 2.8815 | 9.2515 | 22000 | 3.0026 |
| 2.8825 | 9.6722 | 23000 | 2.9974 |
| 2.8312 | 10.0925 | 24000 | 2.9940 |
| 2.8472 | 10.5132 | 25000 | 2.9875 |
| 2.8536 | 10.9338 | 26000 | 2.9748 |
| 2.8264 | 11.3542 | 27000 | 2.9771 |
| 2.8321 | 11.7748 | 28000 | 2.9682 |
| 2.7887 | 12.1952 | 29000 | 2.9709 |
| 2.7964 | 12.6158 | 30000 | 2.9657 |
| 2.7693 | 13.0362 | 31000 | 2.9662 |
| 2.7821 | 13.4568 | 32000 | 2.9598 |
| 2.7789 | 13.8774 | 33000 | 2.9547 |
| 2.7499 | 14.2978 | 34000 | 2.9573 |
| 2.7644 | 14.7184 | 35000 | 2.9529 |
| 2.7347 | 15.1388 | 36000 | 2.9533 |
| 2.736 | 15.5594 | 37000 | 2.9505 |
| 2.7476 | 15.9801 | 38000 | 2.9454 |
| 2.7259 | 16.4004 | 39000 | 2.9481 |
| 2.7222 | 16.8211 | 40000 | 2.9446 |
| 2.7054 | 17.2414 | 41000 | 2.9468 |
| 2.7133 | 17.6621 | 42000 | 2.9437 |
| 2.6935 | 18.0824 | 43000 | 2.9455 |
| 2.6976 | 18.5031 | 44000 | 2.9438 |
| 2.7072 | 18.9237 | 45000 | 2.9419 |
| 2.6934 | 19.3441 | 46000 | 2.9426 |
| 2.6919 | 19.7647 | 47000 | 2.9422 |
Framework versions
- Transformers 4.54.0
- Pytorch 2.10.0+cu128
- Datasets 3.2.0
- Tokenizers 0.21.4
- Downloads last month
- 28