ternary-models: VLMs, Multimodal & Audio
Collection
Ternary-quantized models for architectures GGUF can't handle. tritplane3 scheme. โข 16 items โข Updated โข 2
How to use AsadIsmail/whisper-medium-ternary with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="AsadIsmail/whisper-medium-ternary") # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("AsadIsmail/whisper-medium-ternary", dtype="auto")Ternary-quantized version of openai/whisper-medium.
| Property | Value |
|---|---|
| Base Model | openai/whisper-medium |
| Parameters | 769M |
| Quantization | tritplane3 (240 decoder layers) |
| Audio encoder | FP16 (preserved) |
| Stored size | 453 MB |
| FP16 size | ~3.1 GB |
| Compression | 1.30ร |
from ternary_quant.inference import load_ternary_model
import torch, numpy as np
model, proc = load_ternary_model("AsadIsmail/whisper-medium-ternary", runtime_mode="cached", device="cpu")
model = model.float() # Required for encoder compat
# Transcribe audio
import soundfile as sf
audio, sr = sf.read("audio.flac")
inputs = proc(audio.astype(np.float32), sampling_rate=sr, return_tensors="pt")
inputs = {k: v.float() for k, v in inputs.items()}
with torch.no_grad():
ids = model.generate(**inputs, max_new_tokens=100)
print(proc.batch_decode(ids, skip_special_tokens=True)[0])
Part of ternary-models.
Base model
openai/whisper-medium