Instructions to use luminousresearch/L0-PolyCore-4B-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use luminousresearch/L0-PolyCore-4B-Base with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="luminousresearch/L0-PolyCore-4B-Base")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("luminousresearch/L0-PolyCore-4B-Base")
model = AutoModelForCausalLM.from_pretrained("luminousresearch/L0-PolyCore-4B-Base", device_map="auto")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use luminousresearch/L0-PolyCore-4B-Base with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "luminousresearch/L0-PolyCore-4B-Base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "luminousresearch/L0-PolyCore-4B-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/luminousresearch/L0-PolyCore-4B-Base

SGLang

How to use luminousresearch/L0-PolyCore-4B-Base with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "luminousresearch/L0-PolyCore-4B-Base" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "luminousresearch/L0-PolyCore-4B-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "luminousresearch/L0-PolyCore-4B-Base" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "luminousresearch/L0-PolyCore-4B-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use luminousresearch/L0-PolyCore-4B-Base with Docker Model Runner:
```
docker model run hf.co/luminousresearch/L0-PolyCore-4B-Base
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Training Data

This model was trained on a dataset of curated C/C++ code from multiple licenses (GPL-2.0, Apache-2.0, MIT, public domain, and some source-available licenses, etc.). The original authors are not affiliated with or responsible for this model.

Base Model

Base model: Qwen/Qwen3-4B-Base

Fine-tuning Method

Adapter: QLoRA
Method: CPT
Precision: trained with 4-bit base weights + BF16 compute, then merged to safetensors

Training Details

Training time: ~74 hours
Hardware: 1x NVIDIA RTX 5060 Ti

Notes

This is an L0 base model, it is not instruction-tuned and may be more verbose with strict formatting request compared to an instruct model.
Recommended usage is raw code continuation, or pairing with an external template strategy.

Intended use

Code generation for C/C++
Fast code completion
Examples and prototyping

Constraints

May produce incorrect code
May reproduce identifiable upstream code fragments (including license headers) when prompted.
Verify outputs, especially for memory safety and security-sensitive code.

Downloads last month: 3

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for luminousresearch/L0-PolyCore-4B-Base

Base model

Qwen/Qwen3-4B-Base

Finetuned

(367)

this model

Quantizations

2 models