Auron-279M (Archived)
Note: This model is part of a scaling study. The 279M achieved a final val_loss of 3.188 — virtually identical to the 4x larger 1.1B model (3.180), revealing a scaling wall in the Ouroboros weight-sharing mechanism. The 510M model is the best-performing Chimera variant.
For inference and testing, use Auron-510M (val_loss 3.035).
| Model | Params | Final Val Loss | Status |
|---|---|---|---|
| Auron-279M | 279M | 3.188 | Archived |
| Auron-510M | 510M | 3.035 | Best |
| Auron-1.1B | 1.1B | 3.180 | Archived |
Paper: Auron | Code: github.com/Fy-/Auron | Blog: HuggingFace
Architecture
- Type: Chimera (4 bottom + 4×3 top = 16 virtual)
- Dim: 1024, head_dim=64, expand_v=2
- Params: 279M (123M unique + 155M embed)
- Trained: 250K steps, 5B tokens, WSD schedule
from ouro import load_model, generate
model, tokenizer, device = load_model("nyxia/Auron-510M") # Use 510M
generate(model, tokenizer, device, "The history of")
Built by Florian Gasquez (@nyxia). Part of Soulkyn.
- Downloads last month
- 220
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
