Instructions to use Longxiang-ai/TransNormal with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use Longxiang-ai/TransNormal with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Longxiang-ai/TransNormal", dtype=torch.bfloat16, device_map="cuda") prompt = "Turn this cat into a dog" input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png") image = pipe(image=input_image, prompt=prompt).images[0] - Notebooks
- Google Colab
- Kaggle
TransNormal
Official model weights for TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation (ICML 2026).
TransNormal estimates camera-space surface normal maps from a single RGB image, with a focus on transparent objects such as laboratory glassware. The model adapts Stable Diffusion 2 as a single-step normal regressor and injects dense DINOv3 visual semantics through cross-attention.
Links: Paper | Project page | Code | Dataset
Important: The generic Hugging Face / Diffusers "Use this model" snippet is not sufficient for this repository. TransNormal uses a custom pipeline and requires a DINOv3 backbone in addition to the weights stored here. Please use the instructions below.
What This Repository Contains
This model repository contains:
- Fine-tuned TransNormal diffusion pipeline weights.
cross_attention_projector.pt, the DINOv3-to-U-Net cross-attention projector.- SD2-compatible VAE, U-Net, tokenizer, scheduler, and config files.
This repository does not contain the DINOv3 backbone weights. Download them separately as described below.
Installation
git clone https://github.com/longxiang-ai/TransNormal.git
cd TransNormal
conda create -n TransNormal python=3.10 -y
conda activate TransNormal
pip install -r requirements.txt
The code requires transformers>=4.56.0 for Hugging Face DINOv3 support. BF16 is recommended for DINOv3 inference.
Download Weights
Download the TransNormal weights from this repository:
pip install huggingface_hub
python -c "from huggingface_hub import snapshot_download; snapshot_download('Longxiang-ai/TransNormal', local_dir='./weights/transnormal')"
Download the DINOv3 ViT-H+/16 backbone separately:
python -c "from huggingface_hub import snapshot_download; snapshot_download('facebook/dinov3-vith16plus-pretrain-lvd1689m', local_dir='./weights/dinov3_vith16plus')"
Access to DINOv3 may require approval from Meta / Hugging Face. See the DINOv3 repository and Meta AI DINOv3 downloads for details.
Python Usage
import torch
from transnormal import TransNormalPipeline, create_dino_encoder, save_normal_map
device = "cuda"
dtype = torch.bfloat16
dino_encoder = create_dino_encoder(
model_name="dinov3_vith16plus",
weights_path="./weights/dinov3_vith16plus",
projector_path="./weights/transnormal/cross_attention_projector.pt",
device=device,
dtype=dtype,
freeze_encoder=True,
)
pipe = TransNormalPipeline.from_pretrained(
"./weights/transnormal",
dino_encoder=dino_encoder,
torch_dtype=dtype,
safety_checker=None,
)
pipe = pipe.to(device)
normal_map = pipe(
image="path/to/image.jpg",
timestep=999,
output_type="np",
)
save_normal_map(normal_map, "output_normal.png")
Command Line Usage
Single image:
python inference.py \
--image path/to/image.jpg \
--output normal.png \
--model_path ./weights/transnormal \
--dino_path ./weights/dinov3_vith16plus \
--projector_path ./weights/transnormal/cross_attention_projector.pt \
--timestep 999
Batch inference:
python inference_batch.py \
--input_dir ./examples/input \
--output_dir ./examples/output \
--model_path ./weights/transnormal \
--dino_path ./weights/dinov3_vith16plus \
--timestep 999
Output Format
The output is a normal-map visualization in [0, 1], where 0.5 represents zero for each normal component. See the GitHub README for the current camera-coordinate convention and saving utilities.
Dataset
The accompanying TransNormal-Synthetic dataset is available at:
https://huggingface.co/datasets/Longxiang-ai/TransNormal-Synthetic
It provides physics-based rendered transparent labware scenes with RGB images, surface normal maps, depth maps, masks, material variants, and camera metadata.
License
This model is released under CC BY-NC 4.0. For commercial licensing inquiries, please contact the authors.
Citation
If you find this work useful, please cite:
@misc{li2026transnormal,
title={TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation},
author={Mingwei Li and Hehe Fan and Yi Yang},
year={2026},
eprint={2602.00839},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2602.00839},
}
- Downloads last month
- 35
Model tree for Longxiang-ai/TransNormal
Base model
stabilityai/stable-diffusion-2-base