Verus Reasoning
Collection
1 item • Updated
How to use 8F-ai/Verus-R1 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("image-text-to-text", model="8F-ai/Verus-R1")
messages = [
{
"role": "user",
"content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
{"type": "text", "text": "What animal is on the candy?"}
]
},
]
pipe(text=messages) # Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText
processor = AutoProcessor.from_pretrained("8F-ai/Verus-R1")
model = AutoModelForImageTextToText.from_pretrained("8F-ai/Verus-R1")
messages = [
{
"role": "user",
"content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
{"type": "text", "text": "What animal is on the candy?"}
]
},
]
inputs = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use 8F-ai/Verus-R1 with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "8F-ai/Verus-R1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "8F-ai/Verus-R1",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
]
}'docker model run hf.co/8F-ai/Verus-R1
How to use 8F-ai/Verus-R1 with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "8F-ai/Verus-R1" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "8F-ai/Verus-R1",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "8F-ai/Verus-R1" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "8F-ai/Verus-R1",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
]
}'How to use 8F-ai/Verus-R1 with Docker Model Runner:
docker model run hf.co/8F-ai/Verus-R1
This repository contains model weights and configuration files for Verus-r1 in the Hugging Face Transformers format.
Compatible with Hugging Face Transformers, vLLM, SGLang, and other major inference frameworks.
Built for coding, reasoning, debugging, and concise general assistance.
| Property | Value |
|---|---|
| Parameters | ~2B |
| Context Length | 262,144 tokens |
| Architecture | Qwen3.5 |
| Chat Format | ChatML (<|im_start|> / <|im_end|>) |
| Dtype | bfloat16 |
| License | Apache 2.0 |
pip install "transformers>=4.52.0" accelerate torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
MODEL_ID = "8F-ai/Verus-r1"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
MODEL_ID,
torch_dtype=torch.bfloat16,
device_map="auto",
)
model.eval()
messages = [
{
"role": "system",
"content": "You are Verus-r1, a reasoning coding assistant made by 8F-ai. You think through problems carefully before responding."
},
{
"role": "user",
"content": "Write a Python async context manager that manages a PostgreSQL connection pool using asyncpg."
}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.inference_mode():
generated_ids = model.generate(**inputs, max_new_tokens=2048, temperature=0.6, top_p=0.95)
output = tokenizer.decode(generated_ids[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
print(output)
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
)
tokenizer = AutoTokenizer.from_pretrained("8F-ai/Verus-r1")
model = AutoModelForCausalLM.from_pretrained(
"8F-ai/Verus-r1",
quantization_config=quantization_config,
device_map="auto",
)
| Use Case | Example |
|---|---|
| Code Generation | Write functions, classes, and scripts |
| Debugging | Fix bugs from code or error messages |
| Code Review | Explain code and suggest improvements |
| Reasoning | Break down multi-step problems |
| Long Context | Work with long prompts and files |
| General Q&A | Answer clearly and concisely |
@misc{verusr12026,
title = {Verus-r1: A Reasoning-Focused Coding Language Model with 262K Context},
author = {8F-ai},
year = {2026},
howpublished = {\url{https://huggingface.co/8F-ai/Verus-r1}},
note = {Apache 2.0 License}
}
Verus-r1 is released under the Apache License 2.0. See LICENSE for full terms.
Derived from Qwen/Qwen3.5-2B (Apache 2.0).