Prebuilt CUDA Wheels — Triton 3.6.0 & SageAttention 2.2.0
Pre-compiled Python wheels for Linux x86_64, built against CUDA 12.8 with Python 3.12.
No compilation needed — just pip install the .whl file matching your setup.
Available Wheels
Triton 3.6.0
| Wheel | Size | PyTorch | GPU |
|---|---|---|---|
triton-3.6.0-cp312-cp312-linux_x86_64.whl |
339 MB | Any | All |
Triton is PyTorch-version independent — one wheel works with both PyTorch 2.7 and 2.10.
SageAttention 2.2.0
| Wheel | Size | PyTorch | GPU Arch |
|---|---|---|---|
sageattention-2.2.0+cu128torch2.10.0sm90-… |
21.1 MB | 2.10.0 | Hopper (sm90) |
sageattention-2.2.0+cu128torch2.10.0sm120-… |
15.6 MB | 2.10.0 | Blackwell (sm120) |
sageattention-2.2.0+cu128torch2.7.0sm90-… |
20.2 MB | 2.7.0 | Hopper (sm90) |
sageattention-2.2.0+cu128torch2.7.0sm120-… |
14.9 MB | 2.7.0 | Blackwell (sm120) |
Pick the wheel matching your PyTorch version AND GPU architecture.
Quick Install
# Install Triton
pip install https://huggingface.co/yo9otatara/prebuilt_wheels/resolve/main/triton-3.6.0-cp312-cp312-linux_x86_64.whl
# Install SageAttention — pick ONE matching your setup:
# PyTorch 2.10 + Hopper (H100, H200)
pip install https://huggingface.co/yo9otatara/prebuilt_wheels/resolve/main/sageattention-2.2.0%2Bcu128torch2.10.0sm90-cp312-cp312-linux_x86_64.whl
# PyTorch 2.10 + Blackwell (B100, B200, GB200)
pip install https://huggingface.co/yo9otatara/prebuilt_wheels/resolve/main/sageattention-2.2.0%2Bcu128torch2.10.0sm120-cp312-cp312-linux_x86_64.whl
# PyTorch 2.7 + Hopper (H100, H200)
pip install https://huggingface.co/yo9otatara/prebuilt_wheels/resolve/main/sageattention-2.2.0%2Bcu128torch2.7.0sm90-cp312-cp312-linux_x86_64.whl
# PyTorch 2.7 + Blackwell (B100, B200, GB200)
pip install https://huggingface.co/yo9otatara/prebuilt_wheels/resolve/main/sageattention-2.2.0%2Bcu128torch2.7.0sm120-cp312-cp312-linux_x86_64.whl
Requirements
- OS: Linux x86_64
- Python: 3.12
- CUDA: 12.8
- PyTorch: 2.7.0 or 2.10.0 (match the wheel)
Which GPU wheel do I need?
| GPU | Architecture | Wheel suffix |
|---|---|---|
| H100, H200 | Hopper | sm90 |
| B100, B200, GB200 | Blackwell | sm120 |
Build Info
- Built from source in a Docker container (
nvidia/cuda:12.8.0-devel-ubuntu22.04) - SageAttention source: SageAttention v2.2.0
- Triton source: Triton v3.6.0
- Split-arch build policy: each SageAttention wheel targets exactly one GPU architecture
License
- Triton: MIT License
- SageAttention: Apache 2.0 License
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support