Auron-279M (Archived)

Note: This model is part of a scaling study. The 279M achieved a final val_loss of 3.188 — virtually identical to the 4x larger 1.1B model (3.180), revealing a scaling wall in the Ouroboros weight-sharing mechanism. The 510M model is the best-performing Chimera variant.

For inference and testing, use Auron-510M (val_loss 3.035).

Model	Params	Final Val Loss	Status
Auron-279M	279M	3.188	Archived
Auron-510M	510M	3.035	Best
Auron-1.1B	1.1B	3.180	Archived

Paper: Auron | Code: github.com/Fy-/Auron | Blog: HuggingFace

Architecture

Type: Chimera (4 bottom + 4×3 top = 16 virtual)
Dim: 1024, head_dim=64, expand_v=2
Params: 279M (123M unique + 155M embed)
Trained: 250K steps, 5B tokens, WSD schedule

from ouro import load_model, generate
model, tokenizer, device = load_model("nyxia/Auron-510M")  # Use 510M
generate(model, tokenizer, device, "The history of")

Built by Florian Gasquez (@nyxia). Part of Soulkyn.

Downloads last month: 220

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support