fp8 quant for qwen3.5 models, nearly half memory decrease, speedup 30%, vllm serve can run
HyperAI
Hyper-AI
AI & ML interests
lightvl is a lightweight Vision-Language Model (VLM) quantization toolkit supporting FP8, INT8, FP8-Block. It integrates with vLLM for high-throughput inference and supports Qwen3-VL, Qwen3.5, InternVL-Chat, and Gemma-4 models.
fast quant your model step by step:
1、 pip3 install lightvl
2、 lightvl YOUR_HF_MODEL_PATH
Recent Activity
updated a model 9 days ago
Hyper-AI/Qwen3-VL-Embedding-8B-fp8 updated a model 9 days ago
Hyper-AI/gemma-4-31B-it-fp8 updated a model 9 days ago
Hyper-AI/gemma-4-E4B-it-fp8Organizations
None yet