Qwen3-TTS DLL + ONNX (Minimal, Single-File ONNX)

This Hugging Face repository provides a minimal runtime bundle for Qwen3-TTS:

  • Rust DLL for audio preprocessing + tokenizer (BPE)
  • ONNX models (single .onnx files with embedded weights)
  • Minimal tokenizer files (config.json, vocab.json, merges.txt, tokenizer_config.json)
  • Python sample that runs the full pipeline using ONNX Runtime

Important: ONNX Runtime is not bundled. Install onnxruntime (CPU) or onnxruntime-gpu.

Directory Layout

dist/dll_release/
  qwen3_tts_rust.dll
  qwen3_tts.h
  README_dll_release.txt
  README.md
  onnx_kv/                     # 1.7B ONNX, embedded weights
  onnx_kv_06b/                  # 0.6B ONNX, embedded weights (optional)
  models/
    Qwen3-TTS-12Hz-1.7B-Base/
      config.json
      vocab.json
      merges.txt
      tokenizer_config.json
    Qwen3-TTS-12Hz-0.6B-Base/
      config.json
      vocab.json
      merges.txt
      tokenizer_config.json
  examples/python_dll_call/
    run_pipeline.py

Quick Start (Python)

1. Install dependencies

python -m pip install numpy onnxruntime

For GPU:

python -m pip install numpy onnxruntime-gpu

2. Set DLL path

set QWEN3_TTS_DLL=.\qwen3_tts_rust.dll

3. Run (1.7B)

python examples\python_dll_call\run_pipeline.py ^
  --onnx-dir .\onnx_kv ^
  --model-dir .\models\Qwen3-TTS-12Hz-1.7B-Base ^
  --ref-audio C:\path\to\ref.wav ^
  --ref-text  C:\path\to\ref.txt ^
  --text "Hello world."

4. Run (0.6B)

python examples\python_dll_call\run_pipeline.py ^
  --onnx-dir .\onnx_kv_06b ^
  --model-dir .\models\Qwen3-TTS-12Hz-0.6B-Base ^
  --ref-audio C:\path\to\ref.wav ^
  --ref-text  C:\path\to\ref.txt ^
  --text "Hello world."

CPU / GPU switching

  • Default: CUDA if available, otherwise CPU.
  • Force CPU:
python examples\python_dll_call\run_pipeline.py --device cpu ...

Required Files

Required:

  • qwen3_tts_rust.dll
  • onnx_kv/*.onnx (or onnx_kv_06b/*.onnx)
  • models/<model>/{config.json,vocab.json,merges.txt,tokenizer_config.json}
  • examples/python_dll_call/run_pipeline.py

Optional:

  • qwen3_tts.h (C/C++ bindings)
  • onnx_kv_06b/ (only for 0.6B)

Notes

  • ONNX files are single-file (no .onnx.data, no onnx__MatMul_* shards).
  • Samples are not included. Provide your own reference audio/text.
  • First load can be slow due to large model size.

Troubleshooting

  • DLL not found: set QWEN3_TTS_DLL or run from this folder.
  • CUDAExecutionProvider not available: install onnxruntime-gpu or use --device cpu.
  • InvalidArgument / input shape: ensure reference audio is mono. The script will resample.

License

Apache-2.0. This bundle is derived from Qwen3-TTS: https://github.com/QwenLM/Qwen3-TTS

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support