Kai Models Series
Collection
Kai Models Distilled via Adaptive Dual Search Distillation • 8 items • Updated
• 2
A 30B-parameter instruction-tuned language model optimized for reasoning, math, and code generation tasks, powered by our ADS (Adaptive Dual-Search Distillation) technique. The largest model in the Kai family.
| Model | Kai-30B-Instruct |
| Architecture | LlamaForCausalLM |
| Parameters | ~30B |
| Hidden size | 7168 |
| Intermediate size | 20480 |
| Layers | 60 |
| Attention heads | 56 (8 KV heads, GQA) |
| Head dim | 128 |
| Context length | 4096 |
| Precision | bfloat16 |
| Vocab size | 64,000 |
| Chat template | ChatML (<|im_start|> / <|im_end|>) |
Adaptive Dual-Search Distillation treats model fine-tuning as a constrained optimization problem inspired by Operations Research. The core mechanism is a dynamic loss function with a stateful dual penalty factor that adapts based on embedding space entropy — forcing the model to converge to high-confidence predictions at difficult reasoning points, without modifying the model architecture.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"NoesisLab/Kai-30B-Instruct",
torch_dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("NoesisLab/Kai-30B-Instruct")
messages = [{"role": "user", "content": "What is 25 * 4?"}]
input_ids = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)
output = model.generate(
input_ids,
max_new_tokens=512,
temperature=0.6,
top_p=0.8,
do_sample=True,
)
print(tokenizer.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True))
@misc{noesislab2026kai30b,
title={Kai-30B-Instruct},
author={NoesisLab},
year={2026},
url={https://huggingface.co/NoesisLab/Kai-30B-Instruct}
}
Apache 2.0