TransNormal

Official model weights for TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation (ICML 2026).

TransNormal estimates camera-space surface normal maps from a single RGB image, with a focus on transparent objects such as laboratory glassware. The model adapts Stable Diffusion 2 as a single-step normal regressor and injects dense DINOv3 visual semantics through cross-attention.

Links: Paper | Project page | Code | Dataset

Important: The generic Hugging Face / Diffusers "Use this model" snippet is not sufficient for this repository. TransNormal uses a custom pipeline and requires a DINOv3 backbone in addition to the weights stored here. Please use the instructions below.

What This Repository Contains

This model repository contains:

Fine-tuned TransNormal diffusion pipeline weights.
cross_attention_projector.pt, the DINOv3-to-U-Net cross-attention projector.
SD2-compatible VAE, U-Net, tokenizer, scheduler, and config files.

This repository does not contain the DINOv3 backbone weights. Download them separately as described below.

Installation

git clone https://github.com/longxiang-ai/TransNormal.git
cd TransNormal

conda create -n TransNormal python=3.10 -y
conda activate TransNormal
pip install -r requirements.txt

The code requires transformers>=4.56.0 for Hugging Face DINOv3 support. BF16 is recommended for DINOv3 inference.

Download Weights

Download the TransNormal weights from this repository:

pip install huggingface_hub

python -c "from huggingface_hub import snapshot_download; snapshot_download('Longxiang-ai/TransNormal', local_dir='./weights/transnormal')"

Download the DINOv3 ViT-H+/16 backbone separately:

python -c "from huggingface_hub import snapshot_download; snapshot_download('facebook/dinov3-vith16plus-pretrain-lvd1689m', local_dir='./weights/dinov3_vith16plus')"

Access to DINOv3 may require approval from Meta / Hugging Face. See the DINOv3 repository and Meta AI DINOv3 downloads for details.

Python Usage

import torch
from transnormal import TransNormalPipeline, create_dino_encoder, save_normal_map

device = "cuda"
dtype = torch.bfloat16

dino_encoder = create_dino_encoder(
    model_name="dinov3_vith16plus",
    weights_path="./weights/dinov3_vith16plus",
    projector_path="./weights/transnormal/cross_attention_projector.pt",
    device=device,
    dtype=dtype,
    freeze_encoder=True,
)

pipe = TransNormalPipeline.from_pretrained(
    "./weights/transnormal",
    dino_encoder=dino_encoder,
    torch_dtype=dtype,
    safety_checker=None,
)
pipe = pipe.to(device)

normal_map = pipe(
    image="path/to/image.jpg",
    timestep=999,
    output_type="np",
)

save_normal_map(normal_map, "output_normal.png")

Command Line Usage

Single image:

python inference.py \
    --image path/to/image.jpg \
    --output normal.png \
    --model_path ./weights/transnormal \
    --dino_path ./weights/dinov3_vith16plus \
    --projector_path ./weights/transnormal/cross_attention_projector.pt \
    --timestep 999

Batch inference:

python inference_batch.py \
    --input_dir ./examples/input \
    --output_dir ./examples/output \
    --model_path ./weights/transnormal \
    --dino_path ./weights/dinov3_vith16plus \
    --timestep 999

Output Format

The output is a normal-map visualization in [0, 1], where 0.5 represents zero for each normal component. See the GitHub README for the current camera-coordinate convention and saving utilities.

Dataset

The accompanying TransNormal-Synthetic dataset is available at:

https://huggingface.co/datasets/Longxiang-ai/TransNormal-Synthetic

It provides physics-based rendered transparent labware scenes with RGB images, surface normal maps, depth maps, masks, material variants, and camera metadata.

License

This model is released under CC BY-NC 4.0. For commercial licensing inquiries, please contact the authors.

Citation

If you find this work useful, please cite:

@misc{li2026transnormal,
      title={TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation},
      author={Mingwei Li and Hehe Fan and Yi Yang},
      year={2026},
      eprint={2602.00839},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.00839},
}

Downloads last month: 35

Model tree for Longxiang-ai/TransNormal

Base model

stabilityai/stable-diffusion-2-base

Finetuned

(13)

this model

Dataset used to train Longxiang-ai/TransNormal

Space using Longxiang-ai/TransNormal 1

Paper for Longxiang-ai/TransNormal

TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation

Paper • 2602.00839 • Published Jan 31