Instructions to use Qwen/Qwen-Image-Bench with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Qwen/Qwen-Image-Bench with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Qwen/Qwen-Image-Bench") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("Qwen/Qwen-Image-Bench") model = AutoModelForImageTextToText.from_pretrained("Qwen/Qwen-Image-Bench") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Qwen/Qwen-Image-Bench with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Qwen/Qwen-Image-Bench" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/Qwen-Image-Bench", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Qwen/Qwen-Image-Bench
- SGLang
How to use Qwen/Qwen-Image-Bench with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Qwen/Qwen-Image-Bench" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/Qwen-Image-Bench", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Qwen/Qwen-Image-Bench" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/Qwen-Image-Bench", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use Qwen/Qwen-Image-Bench with Docker Model Runner:
docker model run hf.co/Qwen/Qwen-Image-Bench
Q-Judger
A fine-tuned judge model for evaluating text-to-image (T2I) generation quality. Built on top of Qwen3.6-27B, it scores generated images across 5 hierarchical dimensions using structured checklists and outputs JSON-formatted evaluation results.
Links
| Resource | Link |
|---|---|
| 📑 Paper | http://arxiv.org/abs/2605.28091 |
| 📊 Benchmark Dataset (HuggingFace) | https://huggingface.co/datasets/Qwen/Qwen-Image-Bench |
| 📊 Benchmark Dataset (ModelScope) | https://www.modelscope.cn/datasets/Qwen/Qwen-Image-Bench |
| 💻 GitHub | https://github.com/QwenLM/Qwen-Image-Bench |
| 🧑⚖️ Q-Judger Model | https://huggingface.co/Qwen/Qwen-Image-Bench |
| 🧑⚖️ Q-Judger Model | https://modelscope.cn/models/Qwen/Qwen-Image-Bench |
Model Description
Q-Judger is a vision-language model fine-tuned specifically for automated evaluation of text-to-image generated images. Given a text prompt and a generated image, the model evaluates the image on fine-grained quality criteria organized in a 3-level hierarchy and outputs structured JSON scores.
- Base Model: Qwen3.6-27B
- Task: Image quality evaluation / judging
- Input: Text prompt + generated image
- Output: Structured JSON with per-dimension scores (0 = Fail, 1 = Pass, 2 = Excel, N/A)
- Thinking Mode: Enabled — the model uses chain-of-thought reasoning before producing the final JSON output
Evaluation Dimensions
The model evaluates images across 5 top-level dimensions, each with multiple sub-dimensions:
Quality
- Realism: Physical Logic, Material Texture
- Detail: Noise, Edge Clarity, Naturalness
- Resolution: Resolution
Aesthetics
- Composition: Composition
- Color Harmony: Color Harmony
- Lighting: Lighting & Atmosphere
- Anatomical Portraiture: Anatomical Fidelity
- Emotional Expression: Emotional Expression
- Style Control: Style Control
Alignment
- Attributes: Quantity, Facial Expression, Material Properties, Color, Shape, Size
- Actions: Contact Interaction, Non-contact Interaction, Full-body Action
- Layout: 2D Space, 3D Space
- Relations: Composition Relationship, Difference/Similarity, Containment
- Scene: Real-world Scene, Virtual Scene
Real-world Fidelity
- Fairness: Social Bias, Cultural Fairness
- Safety & Compliance: Safety & Compliance
- World Knowledge: Animals, Objects, Information Visualization, Temporal Characteristics, Cultural Elements
Creative Generation
- Imagination: Imagination
- Feature Matching: Feature Matching
- Logical Resolution: Logical Resolution
- Text Rendering: Text Accuracy, Text Layout, Font, Cross-lingual Generation
- Design Applications: Graphic Design, Product Design, Spatial Design, Fashion Styling, Game Design, Art Design
- Visual Storytelling: Cinematic Style, Camera / Lens Style, Storyboard Creation, Shot Sizes, Composition, Angles, Comic Creation
Scoring Methodology
Raw Score Mapping
| Raw Score | Meaning | Mapped Score |
|---|---|---|
| 0 | Fail | 0 |
| 1 | Pass | 60 |
| 2 | Excel | 100 |
| N/A | Not applicable | Excluded |
Aggregation
- Level-3 → Level-2: Average all non-N/A Level-3 scores within a Level-2 category
- Level-2 → Level-1: Average all Level-2 scores within a Level-1 dimension
- Level-1 → Total: Average all Level-1 dimension scores
Human Agreement
We validate the judge model against human expert rankings by computing Spearman rank correlation ($\rho$) between the model's rankings and human expert rankings across the five L1 pillars and overall. All correlations are statistically significant ($p < 10^{-4}$, $N = 18$ models).
| Dimension | Spearman $\rho$ |
|---|---|
| Quality | 0.89 |
| Aesthetics | 0.89 |
| Alignment | 0.89 |
| Real-world Fidelity | 0.92 |
| Creative Generation | 0.92 |
| Overall | 0.92 |
Quick Start
Get the Inference Code
git clone https://github.com/QwenLM/Qwen-Image-Bench.git
cd Qwen-Image-Bench
Installation
1. Create and activate a virtual environment with uv:
uv venv myenv --python 3.11
source myenv/bin/activate
2. Install PyTorch (select the command matching your CUDA version):
See the official guide: https://pytorch.org/get-started/locally/
3. Install Python dependencies:
uv pip install -r requirements.txt
This installs all required dependencies including ms-swift.
Run Inference
python judge.py \
--input your_data.jsonl \
--model Qwen/Qwen-Image-Bench
Input Format
Prepare a CSV, JSON, or JSONL file with the following columns:
| Column | Type | Description |
|---|---|---|
ID |
int | Prompt identifier (1-1000), must match benchmark metadata |
prompt |
str | The text prompt used to generate the image |
image_path |
str | Path to the generated image file |
Output Format
The model outputs a JSON object per dimension, structured as:
{
"Level-2 Dimension": {
"Level-3 Dimension": {"score": 0|1|2|"N/A"}
}
}
Example (Quality dimension):
{
"Realism": {
"Physical Logic": {"score": 1},
"Material Texture": {"score": 2}
},
"Detail": {
"Noise": {"score": 1},
"Edge Clarity": {"score": 1},
"Naturalness": {"score": 1}
},
"Resolution": {
"Resolution": {"score": 2}
}
}
CLI Options
| Argument | Default | Description |
|---|---|---|
--input |
(required) | Input CSV/JSON/JSONL with ID, prompt, image_path |
--model |
(required) | HuggingFace model ID or local model path |
--hf-bench-repo |
- | HF dataset repo for bench metadata |
--local-metadata |
- | Local metadata file path (overrides default) |
--max-batch-size |
24 | ms-swift max_batch_size |
--max-new-tokens |
4096 | Max generation tokens |
Inference Parameters
The judge model uses fixed inference parameters for reproducibility:
| Parameter | Value |
|---|---|
seed |
42 |
temperature |
0 |
top_k |
1 |
top_p |
1.0 |
repetition_penalty |
1.05 |
max_new_tokens |
4096 |
enable_thinking |
True |
max_batch_size |
24 |
Citation
If you find this model useful, please cite our paper:
@misc{li2026qwenimagebenchgenerationcreationtexttoimage,
title={Qwen-Image-Bench: From Generation to Creation in Text-to-Image Evaluation},
author={Niantong Li and Guangzheng Hu and Weixu Qiao and Ying Ba and Qichen Hong and Shijun Shen and Jinlin Wang and Fan Zhou and Jianye Kang and Xin Shang and Ziyi He and Wei Wang and Dalin Li and Jiahao Li and Jie Zhang and Kaiyuan Gao and Kun Yan and Lihan Jiang and Ningyuan Tang and Shengming Yin and Tianhe Wu and Xiao Xu and Xiaoyue Chen and Yuxiang Chen and Yan Shu and Yanran Zhang and Yilei Chen and Yixian Xu and Zekai Zhang and Zhendong Wang and Zihao Liu and Zikai Zhou and Hongzhu Shi and Yi Wang and Bing Zhao and Hu Wei and Lin Qu and Chenfei Wu},
year={2026},
eprint={2605.28091},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2605.28091},
}
License
This project is licensed under the Apache License 2.0.
- Downloads last month
- 2
Model tree for Qwen/Qwen-Image-Bench
Base model
Qwen/Qwen3.6-27B