Qwen 2.5 0.5B Instruct (GGUF Quantized)

This repository contains the GGUF quantized version of the Qwen 2.5 0.5B Instruct model. It is an Ultra-Lightweight Micro SLM designed for edge devices, mobile phones, and IoT applications.

Model Creator: Qwen Team (Alibaba Cloud)
Quantized By: Md Habibur Rahman (Aasif)
Quantization Format: GGUF (Q4_0)
Target Device: Android, Raspberry Pi, Low-end Laptops

⚑ Performance

This model is extremely fast and requires minimal RAM.

Metric Value
Model Size ~350 MB
RAM Required < 1 GB
Parameters 0.5 Billion
Speed (GPU) 100+ Tokens/sec (Est.)

πŸš€ Usage Code

from huggingface_hub import hf_hub_download
from llama_cpp import Llama

model_path = hf_hub_download(
    repo_id="Habibur2/Qwen2.5-0.5B-GGUF",
    filename="qwen-2.5-0.5b-q4_0.gguf"
)

llm = Llama(model_path=model_path, n_ctx=1024, n_gpu_layers=-1)

response = llm.create_chat_completion(
    messages=[{"role": "user", "content": "Write a hello world code in Python."}]
)

print(response['choices'][0]['message']['content'])
Downloads last month
54
GGUF
Model size
0.5B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Habibur2/Qwen2.5-0.5B-GGUF

Base model

Qwen/Qwen2.5-0.5B
Quantized
(163)
this model