Extending Tower to the speech modality. Spire models are multimodal LLMs capable of transcribing and translating English into 9 different languages.