Covenant-72B
Model Overview
Covenant-72B is the largest permissionless collaboratively trained language model, trained entirely from scratch at the 72 billion parameter scale on 1.1 trillion tokens of English text.
For more details, see the technical report. This is a base model. See Covenant-72B-Chat for the instruction-tuned variant.
Covenant-72B was trained with 20+ globally distributed participants coordinated via decentralized infrastructure on the Bittensor blockchain. Unlike prior collaborative training efforts that use whitelisted compute, Covenant-72B is the first to achieve this scale with fully permissionless participation. Training used the SparseLoCo communication-efficient optimizer to reduce bandwidth requirements across distributed nodes.
Usage
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"1Covenant/Covenant-72B",
torch_dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("1Covenant/Covenant-72B")
input_text = "The theory of general relativity"
input_ids = tokenizer.encode(input_text, return_tensors="pt").to(model.device)
output_ids = model.generate(input_ids, max_new_tokens=100)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
Model Details
- Compute Participants: 20+ independent contributors on Bittensor
- Minimum Compute per Participant: 8×B200 or equivalent
- Model License: Apache 2.0
Technical Specifications
| Parameter | Value |
|---|---|
| Parameter Size | 72B |
| Architecture | LLaMA-style (LlamaForCausalLM) |
| Number of Layers | 80 |
| Number of Attention Heads | 64 (8 KV heads) |
| Hidden Size | 8192 |
| Intermediate Size | 28672 |
| Head Dimension | 128 |
| Vocabulary Size | 262,144 |
Training Details:
- Dataset: DCLM-baseline
- Tokens: 1.1 Trillion
- Optimizer: SparseLoCo (communication-efficient optimizer)
Performance on Benchmarks
All results are 0-shot acc_norm (%) unless noted.
| Model | Size | Tokens | ARC-C | ARC-E | PIQA | OBQA | HellaSwag | WinoGrande* | MMLU* |
|---|---|---|---|---|---|---|---|---|---|
| Covenant-72B | 72B | 1.1T | 56.83 | 80.93 | 81.56 | 44.00 | 80.61 | 75.85 | 67.11 |
| INTELLECT-1 | 10B | 1T | 44.80 | 71.76 | 77.37 | 43.80 | 70.26 | 63.30 | 32.69 |
| Psyche Consilience | 40B | 1.2T | 31.14 | 55.77 | 76.12 | 35.20 | 63.67 | 56.99 | 24.23 |
| LLM360 K2 ckpt_108 | 65B | 420B | 45.73 | 70.54 | 80.90 | 43.20 | 78.23 | 71.90 | 50.01 |
| LLM360 K2 | 65B | 1.4T | 53.75 | 75.97 | 82.54 | 48.00 | 82.86 | 76.40 | 65.51 |
| LLaMA-2-7B | 7B | 2T | 45.05 | 73.82 | 78.73 | 44.20 | 76.18 | 69.38 | 41.73 |
| LLaMA-2-70B | 70B | 2T | 57.42 | 79.55 | 82.59 | 49.40 | 84.34 | 80.43 | 65.63 |
*WinoGrande uses acc; MMLU uses acc.
- Downloads last month
- 283
