Instructions to use chenguolin/sv3d-diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use chenguolin/sv3d-diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image, export_to_video # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("chenguolin/sv3d-diffusers", dtype=torch.bfloat16, device_map="cuda") pipe.to("cuda") prompt = "A man with short gray hair plays a red electric guitar." image = load_image( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png" ) output = pipe(image=image, prompt=prompt).frames[0] export_to_video(output, "output.mp4") - Notebooks
- Google Colab
- Kaggle
| license: other | |
| license_name: sv3d-nc-community | |
| license_link: LICENSE | |
| datasets: | |
| - allenai/objaverse | |
| pipeline_tag: image-to-video | |
| extra_gated_prompt: >- | |
| By clicking "Agree", you agree to the [License Agreement](https://huggingface.co/stabilityai/sv3d/blob/main/LICENSE.md) and acknowledge Stability AI's [Privacy Policy](https://stability.ai/privacy-policy). | |
| extra_gated_fields: | |
| Name: text | |
| Email: text | |
| Country: country | |
| Organization or Affiliation: text | |
| Receive email updates and promotions on Stability AI products, services, and research?: | |
| type: select | |
| options: | |
| - Yes | |
| - No | |
| # [SV3D-diffusers](https://github.com/chenguolin/sv3d-diffusers) | |
|  | |
| This repo (https://github.com/chenguolin/sv3d-diffusers) provides scripts about: | |
| 1. Spatio-temporal UNet (`SV3DUNetSpatioTemporalConditionModel`) and pipeline (`StableVideo3DDiffusionPipeline`) modified from [SVD](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_video_diffusion/pipeline_stable_video_diffusion.py) for [SV3D](https://sv3d.github.io) in the [diffusers](https://github.com/huggingface/diffusers) convention. | |
| 2. Converting the [Stability-AI](https://github.com/Stability-AI/generative-models)'s [SV3D-p UNet checkpoint](https://huggingface.co/stabilityai/sv3d) to the [diffusers](https://github.com/huggingface/diffusers) convention. | |
| 3. Infering the `SV3D-p` model with the [diffusers](https://github.com/huggingface/diffusers) library to synthesize a 21-frame orbital video around a 3D object from a single-view image (preprocessed by removing background and centering first). | |
| Converted SV3D-p checkpoints have been uploaded to HuggingFace🤗 [chenguolin/sv3d-diffusers](https://huggingface.co/chenguolin/sv3d-diffusers). | |
| ## 🚀 Usage | |
| ```bash | |
| git clone https://github.com/chenguolin/sv3d-diffusers.git | |
| # Please install PyTorch first according to your CUDA version | |
| pip3 install -r requirements.txt | |
| # If you can't access to HuggingFace🤗, try: | |
| # export HF_ENDPOINT=https://hf-mirror.com | |
| python3 infer.py --output_dir out/ --image_path assets/images/sculpture.png --elevation 10 --half_precision --seed -1 | |
| ``` | |
| The synthesized video will save at `out/` as a `.gif` file. | |
| ## 📸 Results | |
| > Image preprocessing and random seed for different implementations are different, so the results are presented only for reference. | |
| | Implementation | sculpture | bag | kunkun | | |
| | :------------- | :------: | :----: | :----: | | |
| | **SV3D-diffusers (Ours)** |  |  |  | | |
| | **Official SV3D** |  |  |  | | |
| ## 📚 Citation | |
| If you find this repo helpful, please consider giving this repository a star 🌟 and citing the original SV3D paper. | |
| ``` | |
| @inproceedings{voleti2024sv3d, | |
| author={Voleti, Vikram and Yao, Chun-Han and Boss, Mark and Letts, Adam and Pankratz, David and Tochilkin, Dmitrii and Laforte, Christian and Rombach, Robin and Jampani, Varun}, | |
| title={{SV3D}: Novel Multi-view Synthesis and {3D} Generation from a Single Image using Latent Video Diffusion}, | |
| booktitle={European Conference on Computer Vision (ECCV)}, | |
| year={2024}, | |
| } | |
| ``` | |