CogVideoX-5B โ€” Ternary Quantized (tritplane3)

Ternary-quantized version of zai-org/CogVideoX-5b produced with ternary-quant using component-aware tritplane3 quantization applied to the Diffusion Transformer (DiT) backbone.

This is an experimental diffusers-compatible artifact. It is not a benchmarked replacement for FP8, int8, or other production video quantization paths.

Model Specifications

Property Value
Base Model zai-org/CogVideoX-5b
Architecture Diffusion Transformer (CogVideoXTransformer3DModel)
Transformer Params 5.57B
Quantization tritplane3 (3-plane progressive ternary)
Components Quantized 341 linear layers in the DiT
Text Encoder (T5) FP16 (preserved)
VAE (3D causal) FP16 (preserved)
License Apache 2.0

Size & Compression

Method Transformer Size Bits/Weight Compression
FP16 (original) 11.14 GB 16 1.0ร—
Ternary tritplane3 (theoretical, packed) ~5.57 GB ~8 2.0ร—
FP16 (as stored in this repo) 11.14 GB 16 1.0ร— on disk

Honest note: Weights have ternary precision but are stored in FP16 format for drop-in compatibility with the standard diffusers pipeline. For actual 2ร— disk compression, weights would need packed tritplane format (requires custom inference wrapper).

Memory Requirements (Inference)

Device Peak Memory Recommendation
Apple Silicon MPS (bfloat16) ~24 GB unified M2 Pro 32GB+ or M4 Pro 24GB+
NVIDIA CUDA (bfloat16) ~20 GB VRAM RTX 4090 / A6000
CPU Not recommended Too slow

Quickstart

pip install diffusers transformers accelerate tiktoken sentencepiece protobuf imageio imageio-ffmpeg
import torch
from diffusers import DiffusionPipeline
from diffusers.utils import export_to_video

pipe = DiffusionPipeline.from_pretrained(
    "AsadIsmail/CogVideoX-5b-ternary",
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
)

device = "mps"  # or "cuda"
if device == "mps":
    for attr in ("alphas_cumprod", "betas", "alphas", "sigmas"):
        val = getattr(pipe.scheduler, attr, None)
        if torch.is_tensor(val) and val.dtype == torch.float64:
            setattr(pipe.scheduler, attr, val.float())

pipe.to(device)
pipe.enable_attention_slicing()

result = pipe(
    prompt="a panda playing bass guitar on stage",
    num_frames=9,
    num_inference_steps=25,
    guidance_scale=6.0,
    height=480, width=720,
    generator=torch.Generator(device=device).manual_seed(42),
)

export_to_video(result.frames[0], "output.mp4", fps=8)

Collection

Part of ternary-models.

GitHub: github.com/Asad-Ismail/ternary-models | Library: github.com/Asad-Ismail/ternary-quant

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for AsadIsmail/CogVideoX-5b-ternary

Finetuned
(27)
this model

Collection including AsadIsmail/CogVideoX-5b-ternary