Errors in head of audio output on Apple MPS version of PyTorch

by Phaserblast - opened Feb 28

•

I'm testing this model on a MacBook M4 using the PyTorch MPS backend. 99% of the time, the generated audio output contains a brief glitch of Chinese nonsense at the head of the audio output before the actual English output is generated. I'm using the Vivian voice. Sometimes, the model hallucinates and generates total nonsense for 30 seconds to a minute even though the input text is only a few words.
This isn't a problem when running on the CPU (other than it being slower, of course). Also, no problems on a CUDA system, which consistently works fine.
Any ideas as to where I should look to debug this?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment