Instructions to use prithivMLmods/Vulpecula-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use prithivMLmods/Vulpecula-4B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="prithivMLmods/Vulpecula-4B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("prithivMLmods/Vulpecula-4B")
model = AutoModelForCausalLM.from_pretrained("prithivMLmods/Vulpecula-4B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use prithivMLmods/Vulpecula-4B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "prithivMLmods/Vulpecula-4B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Vulpecula-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/prithivMLmods/Vulpecula-4B

SGLang

How to use prithivMLmods/Vulpecula-4B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "prithivMLmods/Vulpecula-4B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Vulpecula-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "prithivMLmods/Vulpecula-4B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Vulpecula-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use prithivMLmods/Vulpecula-4B with Docker Model Runner:
```
docker model run hf.co/prithivMLmods/Vulpecula-4B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Vulpecula-4B

Vulpecula-4B is fine-tuned based on the traces of SK1.1, consisting of the same 1,000 entries of the DeepSeek thinking trajectory, along with fine-tuning on Fine-Tome 100k and Open Math Reasoning datasets. This specialized 4B parameter model is designed for enhanced mathematical reasoning, logical problem-solving, and structured content generation, optimized for precision and step-by-step explanation.

GGUF : https://huggingface.co/prithivMLmods/Vulpecula-4B-GGUF

Key Features

Advanced Mathematical and Logical Reasoning Fine-tuned on DeepSeek trajectories and Open Math Reasoning to excel at symbolic logic, arithmetic, and complex multi-step math problems, ideal for STEM education and competitions.
Trace-Based Fine-Tuning Leverages SK1.1 trace dataset entries to model deep, interpretable reasoning paths, improving transparency and consistency in problem-solving.
Compact Code Understanding Capable of understanding and generating efficient code snippets in Python, JavaScript, and more, supporting algorithmic explanations and lightweight coding tasks.
Factual and Instructional Precision Trained on curated high-quality data with reasoning benchmarks to minimize hallucinations and strictly follow instructions for structured outputs (Markdown, JSON, tables).
Multilingual Capabilities Supports over 20 languages for technical reasoning and translation, enhancing multilingual educational applications.
Optimized Performance for Resource-Constrained Environments Balances reasoning capability with efficient resource use, suitable for deployment in environments with limited compute.

Quickstart with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "prithivMLmods/Vulpecula-4B"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Solve the equation: 3x + 7 = 22. Show all steps."

messages = [
    {"role": "system", "content": "You are a step-by-step math tutor."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Intended Use

Advanced mathematical and logical problem solving
Education-centric STEM tutoring and explanations
Code assistance and debugging for lightweight coding tasks
Structured content generation including JSON, Markdown, and tables
Multilingual reasoning and technical translation
Efficient deployment in low-resource settings with a focus on accuracy and stepwise reasoning