Upload README.md with huggingface_hub

d1e15b2 verified 2 months ago

3.95 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- text-to-image
	- image-customization
	- diffusion-transformer
	- position-control
	- multi-subject
	- safetensors
	---

	<h3 align="center">
	PositionIC: Unified Position and Identity Consistency for Image Customization
	</h3>

	<p align="center">
	<a href="https://arxiv.org/abs/2507.13861"><img alt="arXiv" src="https://img.shields.io/badge/arXiv-2507.13861-b31b1b.svg"></a>
	<a href="https://arxiv.org/abs/2507.13861"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=Model&color=green"></a>
	</p>

	<p align="center">
	<span style="font-family: Gill Sans">Junjie Hu,</span>
	<span style="font-family: Gill Sans">Tianyang Han,</span>
	<span style="font-family: Gill Sans">Kai Ma,</span>
	<span style="font-family: Gill Sans">Jialin Gao,</span>
	<span style="font-family: Gill Sans">Song Yang</span>
	<br>
	<span style="font-family: Gill Sans">Xianhua He,</span>
	<span style="font-family: Gill Sans">Junfeng Luo,</span>
	<span style="font-family: Gill Sans">Xiaoming Wei,</span>
	<span style="font-family: Gill Sans">Wenqiang Zhang</span>
	</p>

	---

	### 🔥 News
	- ✅ [2026.01.12] We have released our PositionIC model for FLUX on HuggingFace and [github](https://github.com/MeiGen-AI/PositionIC)!
	- ✅ [2025.07.18] Our paper is now available on [arXiv](https://arxiv.org/abs/2507.13861).
	- ⬜ Datasets and PositionIC-v2 model with enhanced generation capabilities are coming soon.

	---

	## 📖 Introduction
	PositionIC is a unified framework for high-fidelity, spatially controllable multi-subject image customization. While recent methods excel in fidelity, fine-grained instance-level spatial control remains a challenge due to the entanglement of identity and layout.

	To address this, we introduce:
	1. BMPDS: The first automatic data-synthesis pipeline for position-annotated multi-subject datasets, providing crucial spatial supervision.
	2. Lightweight Layout-Aware Diffusion: A framework integrating a novel visibility-aware attention mechanism that explicitly models spatial relationships via NeRF-inspired volumetric weight regulation.

	Our experiments demonstrate that PositionIC achieves state-of-the-art performance, setting new records for spatial precision and identity consistency in multi-entity scenarios.

	---

	## ⚡️ Quick Start

	### 🔧 Requirements and Installation
	Follow these steps to set up your environment:

	```bash
	# 1. Create and activate a new conda environment
	conda create -n PositionIC python=3.10 -y
	conda activate PositionIC

	# 2. Install PyTorch (adjust according to your CUDA version)
	pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

	# 3. Install project dependencies
	pip install -r requirements.txt
	```

	---

	## ✍️ Inference
	To generate images with precise position and identity control, run the following command:

	```bash
	python inference_.py \
	--eval_json_path "path/to/your/val_config.json" \
	--dit_lora_path "ScottHan/PositionIC" \
	--saved_dir "./res" \
	--width 1024 \
	--height 1024 \
	--ref_size 512 \
	--seed 3074 \
	--rope_type "uno" \
	--a 5
	```

	---

	## 🙏 Acknowledgments
	Our code is built upon the [UNO](https://github.com/bytedance/UNO) framework. We sincerely thank the authors for their excellent work and open-source contributions.

	---

	## 🌟 Citation
	If you find our work helpful for your research, please consider giving us a star ⭐ and citing our paper:

	```bibtex
	@article{hu2025positionic,
	title={PositionIC: Unified Position and Identity Consistency for Image Customization},
	author={Hu, Junjie and Han, Tianyang and Ma, Kai and Gao, Jialin and Yang, Song and He, Xianhua and Luo, Junfeng and Wei, Xiaoming and Zhang, Wenqiang},
	journal={arXiv preprint arXiv:2507.13861},
	year={2025}
	}
	```

	---

	## 📄 License
	This project is licensed under the [Apache-2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
	```

	---