Fish Speech S2 Pro — Mirror
Mirror of the Fish Speech S2 Pro model by Fish Audio.
Original model: fishaudio/fish-speech-1.5
Available Files
| File | Size | Description |
|---|---|---|
model.safetensors |
9.12 GB | Main language model weights |
codec.pth |
1.87 GB | Audio codec (encoder/decoder) |
config.json |
1.86 KB | Model configuration |
tokenizer.json |
12.2 MB | Tokenizer data |
tokenizer_config.json |
861 KB | Tokenizer configuration |
special_tokens_map.json |
102 KB | Special tokens mapping |
chat_template.jinja |
4.12 KB | Chat template |
Model Details
Fish Speech is a leading open-source text-to-speech (TTS) model that supports high-quality voice cloning and multilingual speech synthesis. The S2 Pro variant offers improved quality and zero-shot voice cloning capabilities.
- Architecture: Qwen3-based language model + audio codec
- Task: Text-to-speech, voice cloning
- Languages: English, Chinese, Japanese, and more
- Code: github.com/fishaudio/fish-speech
Usage with ComfyUI-FFMPEGA
This model is automatically downloaded and used by the ComfyUI-FFMPEGA extension for TTS and voice cloning features.
License
Fish Audio Research License — see LICENSE file.
- ✅ Free for research and non-commercial use
- ❌ Commercial use requires a separate license from Fish Audio (contact: business@fish.audio)
- Downloads last month
- 53