QuantFunc

Qwen-Image-Series

Pre-quantized Qwen-Image-2512 text-to-image model series by QuantFunc, with Lighting backend inference support.

Overview

Qwen-Image-2512 is a text-to-image diffusion model distilled from Alibaba Qwen team's image generation model.

With the latest QuantFunc ComfyUI plugin, inference achieves 2x–6x speedup over mainstream frameworks.

Hardware Requirements

Supports NVIDIA RTX 30 series and above
RTX 20 series does not support BF16, which causes significant precision loss in Qwen series model quantization scenarios. Therefore, the 20 series currently only supports Z-Image models.

Compatibility

The base models in this repository are compatible with any version of Qwen-Image transformer weights
The QuantFunc code plugin and ComfyUI plugin are 100% compatible with previous versions of Qwen-Image models

Directory Structure

Qwen-Image-Series/
├── qwen-image-series-50x-above-base-model/     # Base model, optimized for RTX 50 series and above
│   ├── text_encoder/                            # Qwen2.5-VL text encoder (pre-quantized)
│   ├── vae/                                     # 3D VAE decoder (~242MB)
│   ├── tokenizer/                               # Tokenizer
│   ├── scheduler/                               # Scheduler config
│   ├── model_index.json
│   └── quantfunc_config.json
├── qwen-image-series-50x-below-base-model/     # Base model, optimized for RTX 50 series and below
│   └── (same structure as above)
├── transformer/
│   ├── config.json
│   ├── qwen-image-2512-50x-above-lighting-4steps.safetensors           # RTX 50+ Lighting 4-step (~14GB)
│   ├── qwen-image-2512-50x-above-lighting-4steps-prequant.safetensors  # RTX 50+ Lighting pre-quantized (~11GB)
│   ├── qwen-image-2512-50x-below-lighting-4steps.safetensors           # RTX 30/40 Lighting 4-step (~14GB)
│   └── qwen-image-2512-50x-below-lighting-4steps-prequant.safetensors  # RTX 30/40 Lighting pre-quantized (~11GB)
├── prequant/                                                    # Pre-quantized modulation weights
│   ├── qwen-image-2512-50x-above.safetensors                   # RTX 50+ mod weights (2512)
│   ├── qwen-image-2512-50x-below.safetensors                   # RTX 30/40 mod weights (2512)
│   ├── qwen-image-50x-above.safetensors                        # RTX 50+ mod weights (legacy)
│   └── qwen-image-50x-below.safetensors                        # RTX 30/40 mod weights (legacy)
└── precision-config/                                            # Lighting precision config samples
    ├── 50x-above-fp4-sample.json                                # FP4 config for RTX 50+
    └── 50x-below-int4-sample.json                               # INT4 config for RTX 30/40

Model Variants

Variant	base-model	transformer	Total Size	Target GPU
50x-above	`qwen-image-series-50x-above-base-model`	`qwen-image-2512-50x-above-lighting-4steps.safetensors`	~14GB	RTX 50 series and above
50x-below	`qwen-image-series-50x-below-base-model`	`qwen-image-2512-50x-below-lighting-4steps.safetensors`	~14GB	RTX 30/40 series

50x-above: Optimized for RTX 50 series (Blackwell) and above
50x-below: Optimized for RTX 30/40 series
4steps: Distilled accelerated version — only 4 steps needed to generate images

The base-model and transformer must use the same variant (both above or both below).

Quick Start

Download

pip install huggingface_hub

from huggingface_hub import snapshot_download
model_dir = snapshot_download('QuantFunc/Qwen-Image-Series')

Inference

# RTX 50 series
quantfunc \
  --model-dir Qwen-Image-Series/qwen-image-series-50x-above-base-model \
  --transformer Qwen-Image-Series/transformer/qwen-image-2512-50x-above-lighting-4steps.safetensors \
  --auto-optimize --model-backend lighting \
  --prompt "a beautiful sunset over the ocean with dramatic clouds" \
  --output output.png --steps 4

# RTX 30/40 series
quantfunc \
  --model-dir Qwen-Image-Series/qwen-image-series-50x-below-base-model \
  --transformer Qwen-Image-Series/transformer/qwen-image-2512-50x-below-lighting-4steps.safetensors \
  --auto-optimize --model-backend lighting \
  --prompt "a beautiful sunset over the ocean with dramatic clouds" \
  --output output.png --steps 4

--auto-optimize automatically configures VRAM management, attention backend, and offload strategies based on your GPU.

SVDQ && Lighting Backend

This repository provides Lighting backend models. Differences between the two backends:

Feature	Lighting	SVDQ
Quantization	Per-layer mixed precision (FP4/INT4/FP8/INT8)	Nunchaku-based holistic pre-quantization
LoRA Integration	Real-time quantization — build a custom model in 5 minutes with zero speed loss, integrating any number of LoRAs	Runtime low-rank pathway
Ecosystem	QuantFunc native	Compatible with the widely-adopted Nunchaku ecosystem, enhanced with Rotation quantization and Auto Rank dynamic rank optimization
Flexibility	Per-layer/sub-layer precision control	Precision fixed at export time
Use Cases	Rapid personal model customization, batch LoRA integration	Leverage Nunchaku ecosystem, runtime dynamic LoRA

Pre-quantized Modulation Weights (prequant/)

The prequant/ directory contains pre-quantized modulation weights extracted from SVDQ models. Use them with the Lighting backend for high-quality modulation without runtime quantization overhead.

# From FP16 with mod weights (first run quantizes on-the-fly)
quantfunc \
  --model-dir Qwen-Image-Series/qwen-image-series-50x-above-base-model \
  --model-backend lighting \
  --precision-config Qwen-Image-Series/precision-config/50x-above-fp4-sample.json \
  --mod-weights Qwen-Image-Series/prequant/qwen-image-2512-50x-above.safetensors \
  --rotation-block-size 256 \
  --prompt "a beautiful sunset" --steps 4 --auto-optimize

Alternatively, use the pre-quantized Lighting transformer for instant loading (no runtime quantization):

quantfunc \
  --model-dir Qwen-Image-Series/qwen-image-series-50x-above-base-model \
  --transformer Qwen-Image-Series/transformer/qwen-image-2512-50x-above-lighting-4steps-prequant.safetensors \
  --model-backend lighting \
  --prompt "a beautiful sunset" --steps 4 --auto-optimize

Precision Config (precision-config/)

Sample per-layer precision configurations for the Lighting backend:

File	Target GPU	Precision
`50x-above-fp4-sample.json`	RTX 50+	FP4 attention + AF8WF4 MLP fc2 + INT8 modulation
`50x-below-int4-sample.json`	RTX 30/40	INT4 all layers + INT8 modulation

Related Repositories

QuantFunc/Z-Image-Series — Z-Image-Turbo text-to-image (lightweight, fast)
QuantFunc/Qwen-Image-Edit-Series — Qwen-Image-Edit image editing

License

The pre-quantized model weights in this repository are derived from the original models. Users must comply with the original model's license agreement. The QuantFunc inference engine and its plugins (including the ComfyUI plugin) are licensed separately — see official QuantFunc channels for details.

For models quantized from commercially licensed models, users are responsible for obtaining the necessary commercial licenses from the original model providers.

Downloads last month: -