Instructions to use AsadIsmail/CogVideoX-5b-ternary with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use AsadIsmail/CogVideoX-5b-ternary with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("AsadIsmail/CogVideoX-5b-ternary", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
CogVideoX-5B โ Ternary Quantized (tritplane3)
Ternary-quantized version of zai-org/CogVideoX-5b produced with ternary-quant using component-aware tritplane3 quantization applied to the Diffusion Transformer (DiT) backbone.
This is an experimental diffusers-compatible artifact. It is not a benchmarked replacement for FP8, int8, or other production video quantization paths.
Model Specifications
| Property | Value |
|---|---|
| Base Model | zai-org/CogVideoX-5b |
| Architecture | Diffusion Transformer (CogVideoXTransformer3DModel) |
| Transformer Params | 5.57B |
| Quantization | tritplane3 (3-plane progressive ternary) |
| Components Quantized | 341 linear layers in the DiT |
| Text Encoder (T5) | FP16 (preserved) |
| VAE (3D causal) | FP16 (preserved) |
| License | Apache 2.0 |
Size & Compression
| Method | Transformer Size | Bits/Weight | Compression |
|---|---|---|---|
| FP16 (original) | 11.14 GB | 16 | 1.0ร |
| Ternary tritplane3 (theoretical, packed) | ~5.57 GB | ~8 | 2.0ร |
| FP16 (as stored in this repo) | 11.14 GB | 16 | 1.0ร on disk |
Honest note: Weights have ternary precision but are stored in FP16 format for drop-in compatibility with the standard diffusers pipeline. For actual 2ร disk compression, weights would need packed tritplane format (requires custom inference wrapper).
Memory Requirements (Inference)
| Device | Peak Memory | Recommendation |
|---|---|---|
| Apple Silicon MPS (bfloat16) | ~24 GB unified | M2 Pro 32GB+ or M4 Pro 24GB+ |
| NVIDIA CUDA (bfloat16) | ~20 GB VRAM | RTX 4090 / A6000 |
| CPU | Not recommended | Too slow |
Quickstart
pip install diffusers transformers accelerate tiktoken sentencepiece protobuf imageio imageio-ffmpeg
import torch
from diffusers import DiffusionPipeline
from diffusers.utils import export_to_video
pipe = DiffusionPipeline.from_pretrained(
"AsadIsmail/CogVideoX-5b-ternary",
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
)
device = "mps" # or "cuda"
if device == "mps":
for attr in ("alphas_cumprod", "betas", "alphas", "sigmas"):
val = getattr(pipe.scheduler, attr, None)
if torch.is_tensor(val) and val.dtype == torch.float64:
setattr(pipe.scheduler, attr, val.float())
pipe.to(device)
pipe.enable_attention_slicing()
result = pipe(
prompt="a panda playing bass guitar on stage",
num_frames=9,
num_inference_steps=25,
guidance_scale=6.0,
height=480, width=720,
generator=torch.Generator(device=device).manual_seed(42),
)
export_to_video(result.frames[0], "output.mp4", fps=8)
Collection
Part of ternary-models.
GitHub: github.com/Asad-Ismail/ternary-models | Library: github.com/Asad-Ismail/ternary-quant
- Downloads last month
- 2
Model tree for AsadIsmail/CogVideoX-5b-ternary
Base model
zai-org/CogVideoX-5b