Covenant-72B

Model Overview

Covenant-72B is the largest permissionless collaboratively trained language model, trained entirely from scratch at the 72 billion parameter scale on 1.1 trillion tokens of English text.

For more details, see the technical report. This is a base model. See Covenant-72B-Chat for the instruction-tuned variant.

Covenant-72B was trained with 20+ globally distributed participants coordinated via decentralized infrastructure on the Bittensor blockchain. Unlike prior collaborative training efforts that use whitelisted compute, Covenant-72B is the first to achieve this scale with fully permissionless participation. Training used the SparseLoCo communication-efficient optimizer to reduce bandwidth requirements across distributed nodes.

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "1Covenant/Covenant-72B",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("1Covenant/Covenant-72B")

input_text = "The theory of general relativity"
input_ids = tokenizer.encode(input_text, return_tensors="pt").to(model.device)
output_ids = model.generate(input_ids, max_new_tokens=100)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))

Model Details

Compute Participants: 20+ independent contributors on Bittensor
Minimum Compute per Participant: 8×B200 or equivalent
Model License: Apache 2.0

Technical Specifications

Parameter	Value
Parameter Size	72B
Architecture	LLaMA-style (LlamaForCausalLM)
Number of Layers	80
Number of Attention Heads	64 (8 KV heads)
Hidden Size	8192
Intermediate Size	28672
Head Dimension	128
Vocabulary Size	262,144

Training Details:

Dataset: DCLM-baseline
Tokens: 1.1 Trillion
Optimizer: SparseLoCo (communication-efficient optimizer)

Performance on Benchmarks

All results are 0-shot acc_norm (%) unless noted.

Model	Size	Tokens	ARC-C	ARC-E	PIQA	OBQA	HellaSwag	WinoGrande*	MMLU*
Covenant-72B	72B	1.1T	56.83	80.93	81.56	44.00	80.61	75.85	67.11
INTELLECT-1	10B	1T	44.80	71.76	77.37	43.80	70.26	63.30	32.69
Psyche Consilience	40B	1.2T	31.14	55.77	76.12	35.20	63.67	56.99	24.23
LLM360 K2 ckpt_108	65B	420B	45.73	70.54	80.90	43.20	78.23	71.90	50.01
LLM360 K2	65B	1.4T	53.75	75.97	82.54	48.00	82.86	76.40	65.51
LLaMA-2-7B	7B	2T	45.05	73.82	78.73	44.20	76.18	69.38	41.73
LLaMA-2-70B	70B	2T	57.42	79.55	82.59	49.40	84.34	80.43	65.63

*WinoGrande uses acc; MMLU uses acc.

Downloads last month: 25

Safetensors

Model size

73B params

Tensor type

F32

Dataset used to train 1Covenant/Covenant-72B

Paper for 1Covenant/Covenant-72B

Covenant-72B: Pre-Training a 72B LLM with Trustless Peers Over-the-Internet

Paper • 2603.08163 • Published Mar 9 • 5