Auron

Auron-279M (Archived)

Note: This model is part of a scaling study. The 279M achieved a final val_loss of 3.188 — virtually identical to the 4x larger 1.1B model (3.180), revealing a scaling wall in the Ouroboros weight-sharing mechanism. The 510M model is the best-performing Chimera variant.

For inference and testing, use Auron-510M (val_loss 3.035).

Model Params Final Val Loss Status
Auron-279M 279M 3.188 Archived
Auron-510M 510M 3.035 Best
Auron-1.1B 1.1B 3.180 Archived

Paper: Auron | Code: github.com/Fy-/Auron | Blog: HuggingFace

Architecture

  • Type: Chimera (4 bottom + 4×3 top = 16 virtual)
  • Dim: 1024, head_dim=64, expand_v=2
  • Params: 279M (123M unique + 155M embed)
  • Trained: 250K steps, 5B tokens, WSD schedule
from ouro import load_model, generate
model, tokenizer, device = load_model("nyxia/Auron-510M")  # Use 510M
generate(model, tokenizer, device, "The history of")

Built by Florian Gasquez (@nyxia). Part of Soulkyn.

Downloads last month
220
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support