Fish Speech S2 Pro — Mirror

Mirror of the Fish Speech S2 Pro model by Fish Audio.

Original model: fishaudio/fish-speech-1.5

Available Files

File Size Description
model.safetensors 9.12 GB Main language model weights
codec.pth 1.87 GB Audio codec (encoder/decoder)
config.json 1.86 KB Model configuration
tokenizer.json 12.2 MB Tokenizer data
tokenizer_config.json 861 KB Tokenizer configuration
special_tokens_map.json 102 KB Special tokens mapping
chat_template.jinja 4.12 KB Chat template

Model Details

Fish Speech is a leading open-source text-to-speech (TTS) model that supports high-quality voice cloning and multilingual speech synthesis. The S2 Pro variant offers improved quality and zero-shot voice cloning capabilities.

  • Architecture: Qwen3-based language model + audio codec
  • Task: Text-to-speech, voice cloning
  • Languages: English, Chinese, Japanese, and more
  • Code: github.com/fishaudio/fish-speech

Usage with ComfyUI-FFMPEGA

This model is automatically downloaded and used by the ComfyUI-FFMPEGA extension for TTS and voice cloning features.

License

Fish Audio Research License — see LICENSE file.

Downloads last month
53
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AEmotionStudio/fish-speech-s2-pro

Finetunes
1 model