Image-to-Image
Diffusers
Safetensors
English
Image-to-Image
ControlNet
Diffusers
QwenImageControlNetPipeline
Qwen-Image
Instructions to use Runware/Qwen-Image-ControlNet-Union with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use Runware/Qwen-Image-ControlNet-Union with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Runware/Qwen-Image-ControlNet-Union", dtype=torch.bfloat16, device_map="cuda") prompt = "Turn this cat into a dog" input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png") image = pipe(image=input_image, prompt=prompt).images[0] - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| language: | |
| - en | |
| library_name: diffusers | |
| pipeline_tag: image-to-image | |
| tags: | |
| - Image-to-Image | |
| - ControlNet | |
| - Diffusers | |
| - QwenImageControlNetPipeline | |
| - Qwen-Image | |
| base_model: Qwen/Qwen-Image | |
| # Qwen-Image-ControlNet-Union | |
| This repository provides a unified ControlNet that supports 4 common control types (canny, soft edge, depth, pose) for [Qwen-Image](https://github.com/QwenLM/Qwen-Image). | |
| # Model Cards | |
| - This ControlNet consists of 5 double blocks copied from the pretrained transformer layers. | |
| - We train the model from scratch for 50K steps using a dataset of 10M high-quality general and human images. | |
| - We train at 1328x1328 resolution in BFloat16, batch size=64, learning rate=4e-5. We set the text drop ratio to 0.10. | |
| - This model supports multiple control modes, including canny, soft edge, depth, pose. You can use it just as a normal ControlNet. | |
| # Showcases | |
| <table style="width:100%; table-layout:fixed;"> | |
| <tr> | |
| <td><img src="./conds/canny1.png" alt="canny"></td> | |
| <td><img src="./outputs/canny1.png" alt="canny"></td> | |
| </tr> | |
| <tr> | |
| <td><img src="./conds/soft_edge.png" alt="soft_edge"></td> | |
| <td><img src="./outputs/soft_edge.png" alt="soft_edge"></td> | |
| </tr> | |
| <tr> | |
| <td><img src="./conds/depth.png" alt="depth"></td> | |
| <td><img src="./outputs/depth.png" alt="depth"></td> | |
| </tr> | |
| <tr> | |
| <td><img src="./conds/pose.png" alt="pose"></td> | |
| <td><img src="./outputs/pose.png" alt="pose"></td> | |
| </tr> | |
| </table> | |
| # Inference | |
| ```python | |
| import torch | |
| from diffusers.utils import load_image | |
| # https://github.com/huggingface/diffusers/pull/12215 | |
| # pip install git+https://github.com/huggingface/diffusers | |
| from diffusers import QwenImageControlNetPipeline, QwenImageControlNetModel | |
| base_model = "Qwen/Qwen-Image" | |
| controlnet_model = "InstantX/Qwen-Image-ControlNet-Union" | |
| controlnet = QwenImageControlNetModel.from_pretrained(controlnet_model, torch_dtype=torch.bfloat16) | |
| pipe = QwenImageControlNetPipeline.from_pretrained( | |
| base_model, controlnet=controlnet, torch_dtype=torch.bfloat16 | |
| ) | |
| pipe.to("cuda") | |
| # canny | |
| # it is highly suggested to add 'TEXT' into prompt if there are text elements | |
| control_image = load_image("conds/canny.png") | |
| prompt = "Aesthetics art, traditional asian pagoda, elaborate golden accents, sky blue and white color palette, swirling cloud pattern, digital illustration, east asian architecture, ornamental rooftop, intricate detailing on building, cultural representation." | |
| controlnet_conditioning_scale = 1.0 | |
| # soft edge | |
| # control_image = load_image("conds/soft_edge.png") | |
| # prompt = "Photograph of a young man with light brown hair jumping mid-air off a large, reddish-brown rock. He's wearing a navy blue sweater, light blue shirt, gray pants, and brown shoes. His arms are outstretched, and he has a slight smile on his face. The background features a cloudy sky and a distant, leafless tree line. The grass around the rock is patchy." | |
| # controlnet_conditioning_scale = 1.0 | |
| # depth | |
| # control_image = load_image("conds/depth.png") | |
| # prompt = "A swanky, minimalist living room with a huge floor-to-ceiling window letting in loads of natural light. A beige couch with white cushions sits on a wooden floor, with a matching coffee table in front. The walls are a soft, warm beige, decorated with two framed botanical prints. A potted plant chills in the corner near the window. Sunlight pours through the leaves outside, casting cool shadows on the floor." | |
| # controlnet_conditioning_scale = 1.0 | |
| # pose | |
| # control_image = load_image("conds/pose.png") | |
| # prompt = "Photograph of a young man with light brown hair and a beard, wearing a beige flat cap, black leather jacket, gray shirt, brown pants, and white sneakers. He's sitting on a concrete ledge in front of a large circular window, with a cityscape reflected in the glass. The wall is cream-colored, and the sky is clear blue. His shadow is cast on the wall." | |
| # controlnet_conditioning_scale = 1.0 | |
| image = pipe( | |
| prompt=prompt, | |
| negative_prompt=" ", | |
| control_image=control_image, | |
| controlnet_conditioning_scale=controlnet_conditioning_scale, | |
| width=control_image.size[0], | |
| height=control_image.size[1], | |
| num_inference_steps=30, | |
| true_cfg_scale=4.0, | |
| generator=torch.Generator(device="cuda").manual_seed(42), | |
| ).images[0] | |
| image.save(f"qwenimage_cn_union_result.png") | |
| ``` | |
| # Inference Setting | |
| You can adjust control strength via controlnet_conditioning_scale. | |
| - Canny: use cv2.Canny, set controlnet_conditioning_scale in [0.8, 1.0] | |
| - Soft Edge: use [AnylineDetector](https://github.com/huggingface/controlnet_aux), set controlnet_conditioning_scale in [0.8, 1.0] | |
| - Depth: use [depth-anything](https://github.com/DepthAnything/Depth-Anything-V2), set controlnet_conditioning_scale in [0.8, 1.0] | |
| - Pose: use [DWPose](https://github.com/IDEA-Research/DWPose/tree/onnx), set controlnet_conditioning_scale in [0.8, 1.0] | |
| We strongly recommend using detailed prompts, especially when include text elements. For example, use "a poster with text 'InstantX Team' on the top" instead of "a poster". | |
| For multiple conditions inference, please refer to [PR](https://github.com/huggingface/diffusers/pull/12215). | |
| # ComfyUI Support | |
| [ComfyUI](https://www.comfy.org/) offers native support for Qwen-Image-ControlNet-Union. Check the [blog](https://blog.comfy.org/p/day-1-support-of-qwen-image-instantx) for more details. | |
| # Community Support | |
| [Liblib AI](https://www.liblib.art/) offers native support for Qwen-Image-ControlNet-Union. [Visit](https://www.liblib.art/sd) for online inference. | |
| # Limitations | |
| We find that the model was unable to preserve some details without explicit 'TEXT' in prompt, such as small font text. | |
| # Acknowledgements | |
| This model is developed by InstantX Team. All copyright reserved. | |