Fish Speech S2 Pro — Mirror

Mirror of the Fish Speech S2 Pro model by Fish Audio.

Original model: fishaudio/fish-speech-1.5

Available Files

File	Size	Description
`model.safetensors`	9.12 GB	Main language model weights
`codec.pth`	1.87 GB	Audio codec (encoder/decoder)
`config.json`	1.86 KB	Model configuration
`tokenizer.json`	12.2 MB	Tokenizer data
`tokenizer_config.json`	861 KB	Tokenizer configuration
`special_tokens_map.json`	102 KB	Special tokens mapping
`chat_template.jinja`	4.12 KB	Chat template

Model Details

Fish Speech is a leading open-source text-to-speech (TTS) model that supports high-quality voice cloning and multilingual speech synthesis. The S2 Pro variant offers improved quality and zero-shot voice cloning capabilities.

Architecture: Qwen3-based language model + audio codec
Task: Text-to-speech, voice cloning
Languages: English, Chinese, Japanese, and more
Code: github.com/fishaudio/fish-speech

Usage with ComfyUI-FFMPEGA

This model is automatically downloaded and used by the ComfyUI-FFMPEGA extension for TTS and voice cloning features.

License

Fish Audio Research License — see LICENSE file.

✅ Free for research and non-commercial use
❌ Commercial use requires a separate license from Fish Audio (contact: business@fish.audio)

Downloads last month: 53

Model tree for AEmotionStudio/fish-speech-s2-pro

Finetunes

1 model