Qwen 2.5 0.5B Instruct (GGUF Quantized)

This repository contains the GGUF quantized version of the Qwen 2.5 0.5B Instruct model. It is an Ultra-Lightweight Micro SLM designed for edge devices, mobile phones, and IoT applications.

Model Creator: Qwen Team (Alibaba Cloud)
Quantized By: Md Habibur Rahman (Aasif)
Quantization Format: GGUF (Q4_0)
Target Device: Android, Raspberry Pi, Low-end Laptops

⚡ Performance

This model is extremely fast and requires minimal RAM.

Metric	Value
Model Size	~350 MB
RAM Required	< 1 GB
Parameters	0.5 Billion
Speed (GPU)	100+ Tokens/sec (Est.)

🚀 Usage Code

from huggingface_hub import hf_hub_download
from llama_cpp import Llama

model_path = hf_hub_download(
    repo_id="Habibur2/Qwen2.5-0.5B-GGUF",
    filename="qwen-2.5-0.5b-q4_0.gguf"
)

llm = Llama(model_path=model_path, n_ctx=1024, n_gpu_layers=-1)

response = llm.create_chat_completion(
    messages=[{"role": "user", "content": "Write a hello world code in Python."}]
)

print(response['choices'][0]['message']['content'])

Downloads last month: 54

GGUF

Model size

0.5B params

Architecture

qwen2

Hardware compatibility

4-bit

Model tree for Habibur2/Qwen2.5-0.5B-GGUF

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-0.5B-Instruct

Quantized

(163)

this model