CogVideoX-5B — Ternary Quantized (tritplane3)

Ternary-quantized version of zai-org/CogVideoX-5b produced with ternary-quant using component-aware tritplane3 quantization applied to the Diffusion Transformer (DiT) backbone.

This is an experimental diffusers-compatible artifact. It is not a benchmarked replacement for FP8, int8, or other production video quantization paths.

Model Specifications

Property	Value
Base Model	zai-org/CogVideoX-5b
Architecture	Diffusion Transformer (CogVideoXTransformer3DModel)
Transformer Params	5.57B
Quantization	tritplane3 (3-plane progressive ternary)
Components Quantized	341 linear layers in the DiT
Text Encoder (T5)	FP16 (preserved)
VAE (3D causal)	FP16 (preserved)
License	Apache 2.0

Size & Compression

Method	Transformer Size	Bits/Weight	Compression
FP16 (original)	11.14 GB	16	1.0×
Ternary tritplane3 (theoretical, packed)	~5.57 GB	~8	2.0×
FP16 (as stored in this repo)	11.14 GB	16	1.0× on disk

Honest note: Weights have ternary precision but are stored in FP16 format for drop-in compatibility with the standard diffusers pipeline. For actual 2× disk compression, weights would need packed tritplane format (requires custom inference wrapper).

Memory Requirements (Inference)

Device	Peak Memory	Recommendation
Apple Silicon MPS (bfloat16)	~24 GB unified	M2 Pro 32GB+ or M4 Pro 24GB+
NVIDIA CUDA (bfloat16)	~20 GB VRAM	RTX 4090 / A6000
CPU	Not recommended	Too slow

Quickstart

pip install diffusers transformers accelerate tiktoken sentencepiece protobuf imageio imageio-ffmpeg

import torch
from diffusers import DiffusionPipeline
from diffusers.utils import export_to_video

pipe = DiffusionPipeline.from_pretrained(
    "AsadIsmail/CogVideoX-5b-ternary",
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
)

device = "mps"  # or "cuda"
if device == "mps":
    for attr in ("alphas_cumprod", "betas", "alphas", "sigmas"):
        val = getattr(pipe.scheduler, attr, None)
        if torch.is_tensor(val) and val.dtype == torch.float64:
            setattr(pipe.scheduler, attr, val.float())

pipe.to(device)
pipe.enable_attention_slicing()

result = pipe(
    prompt="a panda playing bass guitar on stage",
    num_frames=9,
    num_inference_steps=25,
    guidance_scale=6.0,
    height=480, width=720,
    generator=torch.Generator(device=device).manual_seed(42),
)

export_to_video(result.frames[0], "output.mp4", fps=8)

Collection

Part of ternary-models.

GitHub: github.com/Asad-Ismail/ternary-models | Library: github.com/Asad-Ismail/ternary-quant

Downloads last month: 2

Model tree for AsadIsmail/CogVideoX-5b-ternary

Base model

zai-org/CogVideoX-5b

Finetuned

(27)

this model

Collection including AsadIsmail/CogVideoX-5b-ternary

ternary-models: VLMs, Multimodal & Audio

Collection

Ternary-quantized models for architectures GGUF can't handle. tritplane3 scheme. • 16 items • Updated Apr 17 • 2