wol-genesis-audio-aligned-speecht5

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0560

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 3407
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 4000
  • training_steps: 40000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.0565 18.1843 1000 0.0462
0.052 36.3687 2000 0.0430
0.0473 54.5530 3000 0.0430
0.0483 72.7373 4000 0.0447
0.0432 90.9217 5000 0.0442
0.0366 109.0922 6000 0.0450
0.0357 127.2765 7000 0.0471
0.0351 145.4608 8000 0.0476
0.0346 163.6452 9000 0.0477
0.0329 181.8295 10000 0.0493
0.0367 200.0 11000 0.0498
0.032 218.1843 12000 0.0507
0.0314 236.3687 13000 0.0500
0.0304 254.5530 14000 0.0518
0.0296 272.7373 15000 0.0516
0.0333 290.9217 16000 0.0517
0.0317 309.0922 17000 0.0549
0.0284 327.2765 18000 0.0549
0.0282 345.4608 19000 0.0538
0.0275 363.6452 20000 0.0540
0.0266 381.8295 21000 0.0540
0.0284 400.0 22000 0.0563
0.025 418.1843 23000 0.0544
0.025 436.3687 24000 0.0553
0.0259 454.5530 25000 0.0559
0.0248 472.7373 26000 0.0560
0.0245 490.9217 27000 0.0565
0.0239 509.0922 28000 0.0560
0.0235 527.2765 29000 0.0562
0.0245 545.4608 30000 0.0559
0.0235 563.6452 31000 0.0569
0.0237 581.8295 32000 0.0562
0.0232 600.0 33000 0.0558
0.0237 618.1843 34000 0.0570
0.0248 636.3687 35000 0.0562
0.0224 654.5530 36000 0.0570
0.0228 672.7373 37000 0.0560
0.0242 690.9217 38000 0.0556
0.023 709.0922 39000 0.0563
0.0244 727.2765 40000 0.0560

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.2
Downloads last month
2
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sil-ai/wol-genesis-audio-aligned-speecht5

Finetuned
(1364)
this model