DeciLM 6B

DeciLM 6B is a 5.7 billion parameter decoder-only text generation model. With a context window of 4096 tokens, the highly efficient model uses variable Grouped-Query Attention (GQA) to achieve an optimal balance between performance and computational efficiency. The model's architecture was generated using Deci's proprietary Neural Architecture Search-based technology, AutoNAC.

Model Details

Model Description

Deci developed and publically released the DeciLM 6B large language model, a pretrained, high-efficiency generative text model with 5.7 billion parameters. DeciLM 6B outpaces pretrained models in its class, with a throughput that's up to 15 times that of Llama 2 7B's. DeciLM-6B was further fine-tuned using LoRA for instruction following on a subset of the OpenOrca dataset, creating DeciLM 6B-Instruct

Developed by: Deci
Model type: DeciLM is an auto-regressive language model using an optimized transformer decoder architecture that includes variable Grouped-Query Attention.
Language(s) (NLP): English
License: Llama 2 Community License Agreement with an extention of Deci regarding hosting service providers.

Model Architecture

Parameters	Layers	Heads	Sequence Length	GQA num_key_value_heads*	Hidden Size
5.7B	32	32	4096	Variable	4096

*AutoNAC was employed to optimize the selection of the GQA num_key_value_heads for each layer of the model.

Decoder layer: Varible Grouped Query Attention. Grouped Query Attention (GQA) was introduced in Ainslie et al., 2023
Position Embeddings: Dynamic NTK Scaling Rotary Position Embeddings Su et al., 2021

Model Sources

Uses

The model is intended for commercial and research use in English and can be fine-tuned for use in other languages.

How to Get Started with the Model

Use the code below to get started with the model.

# pip install -q transformers

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "Deci/DeciLM-6b"
device = "cuda" # for GPU usage or "cpu" for CPU usage

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype=torch.bfloat16, trust_remote_code=True).to(device)

inputs = tokenizer.encode("In a shocking finding, scientists discovered a herd of unicorns living in", return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=100, do_sample=True, top_p=0.95)
print(tokenizer.decode(outputs[0]))

Training Details

DeciLM 6B underwent training utilizing a subset of the SlimPajamas dataset, leveraging advanced proprietary methodologies allowing for fast training.

Evaluation

Below are DeciLM's 6B evaluation results.

Average	ARC Challenge*	ARC Easy*	BoolQ	HellaSwag*	LAMBDA OpenAI	OpenBookQA	PIQA	TruthfulQA	Winogrande
60.33	42.06	70.02	71.01	74.58	69.78	34	77.09	36.19	68.03
Accuracy-norm score*

Runtime Benchmarks

Inference Tool/Hardware	A10 (tokens/sec)
PyTorch	652.49
Infery LLM	2,029.6

Throughput (tokens/sec) - Measured with optimal batch - PyTorch BS 64, Infery LLM BS 128
In order to replicate the results of the PyTorch benchmark, use this code example

How to Cite

Please cite this model using this format.

@misc{DeciFoundationModels,
title = {DeciLM 6B},
author = {DeciAI Research Team},
year = {2023}
url={[https://huggingface.co/Deci/DeciLM-6b](https://huggingface.co/Deci/DeciLM-6b)},
}