Instructions to use ResembleAI/chatterbox with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Chatterbox
How to use ResembleAI/chatterbox with Chatterbox:
# pip install chatterbox-tts import torchaudio as ta from chatterbox.tts import ChatterboxTTS model = ChatterboxTTS.from_pretrained(device="cuda") text = "Ezreal and Jinx teamed up with Ahri, Yasuo, and Teemo to take down the enemy's Nexus in an epic late-game pentakill." wav = model.generate(text) ta.save("test-1.wav", wav, model.sr) # If you want to synthesize with a different voice, specify the audio prompt AUDIO_PROMPT_PATH="YOUR_FILE.wav" wav = model.generate(text, audio_prompt_path=AUDIO_PROMPT_PATH) ta.save("test-2.wav", wav, model.sr) - Inference
- Notebooks
- Google Colab
- Kaggle
Fine-Tuning Chatterbox for Multilingual & Emotion-Aware Conversations (Persian Included)
Hi ResembleAI Team,
We really appreciate your work on Chatterbox — it’s an impressive conversational model with strong emotional understanding and expressive dialogue abilities.
We’d like to contribute by helping extend Chatterbox to new languages, especially Persian (Farsi). Our goal is to improve multilingual performance and also integrate emotional and affective layers into the responses to make interactions feel more natural across cultures.
We have access to powerful compute resources (GPU clusters) and a dedicated team ready to work on data curation and fine-tuning.
Could you please share a technical guide or documentation on how to:
Fine-tune or extend Chatterbox on new datasets.
Adjust the emotional tone or expression in dialogue.
Properly evaluate and align the model for new languages.
We’d be glad to collaborate or follow your best practices if there’s an internal pipeline or API structure for model training.
Thanks again for making this model public — we’re excited to help expand it further!
Best,
Hamed
The model won’t be released publicly, which means finding the full training code will be quite difficult.