Guide to run Kimi K2.5 locally on your device.

#19

by shimmyshimmer - opened 7 days ago

7 days ago

Hey guys we made a guide to run the model locally. You'll need 240GB RAM or unified memory for best results.

Note that VRAM is not required.
You can run on a Mac with 256GB unified memory with similar speeds or 256 RAM without VRAM.

You can even run with much less compute (e.g. 80GB RAM) as it'll offload but it'll be slower.

Guide: https://unsloth.ai/docs/models/kimi-k2.5
GGUFs to run: https://huggingface.co/unsloth/Kimi-K2.5-GGUF

youhanasheriff

5 days ago

•

edited 5 days ago

What's the quality of the output? Does it give the same quality in writing and tool calling for Agentic works like the full model?

ThanhNguyxn

2 days ago

Hi @youhanasheriff ,

Great question! Here's what you should expect from the GGUF quantized versions:

Quality Expectations

Quantization	Size	Quality Impact
Q8_0	~530GB	Virtually identical to FP16 (<1% degradation)
Q6_K	~400GB	Excellent quality, minimal loss
Q4_K_M	~280GB	Good quality, slight degradation on complex tasks
Q3_K_M	~210GB	Noticeable quality drop, still usable
Q2_K	~150GB	Significant degradation, for testing only

For Agentic/Tool Calling

Tool calling and agentic tasks are more sensitive to quantization than general chat because:

Structured JSON output requires precise token prediction
Multi-step reasoning accumulates small errors
Code generation needs exact syntax

Recommendations:

For serious agentic work: Q6_K or Q8_0
For casual use/testing: Q4_K_M works reasonably well
Avoid Q3 and below for tool calling

Reality Check

The full FP16/INT4 model on GPU clusters will always outperform GGUF on CPU/RAM, but for local experimentation and development, the Q6_K/Q8_0 quantizations are remarkably good.

The Unsloth team has done excellent work optimizing these quantizations specifically for Kimi-K2.5.

Hope this helps!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment