Text Generation
Safetensors
English
llama

Covenant-72B

Model Overview

Covenant-72B is the largest permissionless collaboratively trained language model, trained entirely from scratch at the 72 billion parameter scale on 1.1 trillion tokens of English text.

Covenant-72B

For more details, see the technical report. This is a base model. See Covenant-72B-Chat for the instruction-tuned variant.

Covenant-72B was trained with 20+ globally distributed participants coordinated via decentralized infrastructure on the Bittensor blockchain. Unlike prior collaborative training efforts that use whitelisted compute, Covenant-72B is the first to achieve this scale with fully permissionless participation. Training used the SparseLoCo communication-efficient optimizer to reduce bandwidth requirements across distributed nodes.

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "1Covenant/Covenant-72B",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("1Covenant/Covenant-72B")

input_text = "The theory of general relativity"
input_ids = tokenizer.encode(input_text, return_tensors="pt").to(model.device)
output_ids = model.generate(input_ids, max_new_tokens=100)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))

Model Details

  • Compute Participants: 20+ independent contributors on Bittensor
  • Minimum Compute per Participant: 8×B200 or equivalent
  • Model License: Apache 2.0

Technical Specifications

Parameter Value
Parameter Size 72B
Architecture LLaMA-style (LlamaForCausalLM)
Number of Layers 80
Number of Attention Heads 64 (8 KV heads)
Hidden Size 8192
Intermediate Size 28672
Head Dimension 128
Vocabulary Size 262,144

Training Details:

  • Dataset: DCLM-baseline
  • Tokens: 1.1 Trillion
  • Optimizer: SparseLoCo (communication-efficient optimizer)

Performance on Benchmarks

All results are 0-shot acc_norm (%) unless noted.

Model Size Tokens ARC-C ARC-E PIQA OBQA HellaSwag WinoGrande* MMLU*
Covenant-72B 72B 1.1T 56.83 80.93 81.56 44.00 80.61 75.85 67.11
INTELLECT-1 10B 1T 44.80 71.76 77.37 43.80 70.26 63.30 32.69
Psyche Consilience 40B 1.2T 31.14 55.77 76.12 35.20 63.67 56.99 24.23
LLM360 K2 ckpt_108 65B 420B 45.73 70.54 80.90 43.20 78.23 71.90 50.01
LLM360 K2 65B 1.4T 53.75 75.97 82.54 48.00 82.86 76.40 65.51
LLaMA-2-7B 7B 2T 45.05 73.82 78.73 44.20 76.18 69.38 41.73
LLaMA-2-70B 70B 2T 57.42 79.55 82.59 49.40 84.34 80.43 65.63

*WinoGrande uses acc; MMLU uses acc.

Downloads last month
283
Safetensors
Model size
73B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train 1Covenant/Covenant-72B

Paper for 1Covenant/Covenant-72B