Instructions to use DeepBrainz/DeepBrainz-R1-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use DeepBrainz/DeepBrainz-R1-4B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="DeepBrainz/DeepBrainz-R1-4B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("DeepBrainz/DeepBrainz-R1-4B")
model = AutoModelForCausalLM.from_pretrained("DeepBrainz/DeepBrainz-R1-4B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use DeepBrainz/DeepBrainz-R1-4B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "DeepBrainz/DeepBrainz-R1-4B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DeepBrainz/DeepBrainz-R1-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/DeepBrainz/DeepBrainz-R1-4B

SGLang

How to use DeepBrainz/DeepBrainz-R1-4B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "DeepBrainz/DeepBrainz-R1-4B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DeepBrainz/DeepBrainz-R1-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "DeepBrainz/DeepBrainz-R1-4B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DeepBrainz/DeepBrainz-R1-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use DeepBrainz/DeepBrainz-R1-4B with Docker Model Runner:
```
docker model run hf.co/DeepBrainz/DeepBrainz-R1-4B
```

DeepBrainz-R1-4B / README.md

ArunkumarVR

Update README.md

a053d78 verified 3 months ago

preview code

raw

history blame contribute delete

5.22 kB

	---
	license: apache-2.0
	language:
	- en
	pipeline_tag: text-generation
	tags:
	- deepbrainz
	- reasoning
	- mathematics
	- code
	- enterprise
	- 4b
	- long-context
	- 32k
	library_name: transformers

	---

	### 🚀 Introducing DeepBrainz-R1 — Reasoning-First Small Language Models for Agentic Systems

	Today we’re releasing DeepBrainz-R1, a family of reasoning-first Small Language Models (SLMs) designed for agentic AI systems in real-world production.

	Agentic systems don’t ask once — they reason repeatedly. Tool calls, verification loops, schema-constrained outputs, retries, and long-context planning fundamentally change the economics and reliability requirements of language models. LLM-only stacks struggle under this load.

	DeepBrainz-R1 is built from the opposite premise:

	> Reasoning is a trained behavior, not an emergent side-effect of scale.

	#### What DeepBrainz-R1 is designed for

	* Repeatable multi-step reasoning, not one-shot chat
	* Agent-compatible behavior: tool use, structured outputs, low-variance reasoning
	* Production economics: lower latency, predictable cost, deployability
	* Inference-time scalability: compute where needed, not everywhere

	#### The R1 lineup

	* [DeepBrainz-R1-4B](https://huggingface.co/DeepBrainz/DeepBrainz-R1-4B) — Flagship production model
	Best starting point for reliable agentic systems.
	* [DeepBrainz-R1-2B](https://huggingface.co/DeepBrainz/DeepBrainz-R1-2B) — Balanced production model
	Strong reasoning with lower cost and latency.
	* [DeepBrainz-R1-0.6B-v2](https://huggingface.co/DeepBrainz/DeepBrainz-R1-0.6B-v2) — Canonical small model
	Cost-efficient baseline for small-model agent workloads.
	* [Long-context variants (16K / 40K)](https://huggingface.co/collections/DeepBrainz/deepbrainz-r1-reasoning-first-slms-for-agentic-systems) — early and experimental
	* [Research checkpoints](https://huggingface.co/collections/DeepBrainz/deepbrainz-r1-research-checkpoints) — raw artifacts for ablation and evaluation
	* [Community quantizations (GGUF, low-bit)](https://huggingface.co/collections/DeepBrainz/deepbrainz-r1-community-quantizations-gguf-and-low-bit) — community-maintained, not officially supported

	We publish supported releases, experimental variants, and research checkpoints separately to keep expectations clear for builders, enterprises, and researchers.

	#### Why now

	2026 is the year agentic AI stops being a demo and starts becoming infrastructure. Infrastructure cannot rely on LLM-only economics or LLM-only reliability.
	Reasoning-first SLMs are the only viable path to scaling agents sustainably.

	— DeepBrainz AI & Labs

	---

	# DeepBrainz-R1-4B

	DeepBrainz-R1-4B is a compact, high-performance reasoning model engineered by DeepBrainz AI & Labs. It is part of the DeepBrainz-R1 Series, designed to deliver frontier-class reasoning capabilities in cost-effective parameter sizes.

	This variant offers an extended context window (up to 32,768 tokens), making it suitable for medium-length document and code analysis.

	---

	## 🚀 Model Highlights

	- Parameter Count: ~4B
	- Context Window: 32,768 tokens
	- Context Type: Extended (RoPE)
	- Specialization: STEM Reasoning, Logic, Code Analysis
	- Architecture: Optimized Dense Transformer
	- Deployment: Ready for vLLM, SGLang, and local inference

	---

	## 🎯 Intended Use Cases

	- Agentic Workflows: Reliability in multi-step planning tasks.
	- Math & Science: Solving complex word problems and equations.
	- Code Generation: Writing and debugging algorithms.
	- Structured Data Extraction: Parsing and reasoning over unstructured text.

	> Note: This model has undergone post-training to enhance reasoning quality and agentic reliability.
	> It is not optimized for open-ended conversational chat without additional instruction tuning.

	---

	## 💻 Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_id = "DeepBrainz/DeepBrainz-R1-4B"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype="bfloat16",
	device_map="auto"
	)

	prompt = "Analyze the time complexity of the following algorithm:"
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	outputs = model.generate(**inputs, max_new_tokens=256)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	---

	## 🏗️ Technical Summary

	The model has undergone post-training to improve reasoning quality, output stability, and robustness under agentic workloads.

	Detailed post-training recipes and dataset compositions are not fully disclosed.

	---

	## 🛡️ Limitations & Safety

	While this model demonstrates strong reasoning capabilities, it may still produce inaccurate information ("hallucinations"). Users should implement appropriate guardrails for production deployments.

	---

	## 📜 License

	This model is released under the Apache 2.0 license, allowing for academic and commercial use.

	---

	<div align="center">
	<b>DeepBrainz AI & Labs</b><br>
	<i>Advancing General Intelligence through Scalable Reasoning</i>
	</div>