wol-genesis-audio-aligned-speecht5

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 3407
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 4000
training_steps: 40000
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.0565	18.1843	1000	0.0462
0.052	36.3687	2000	0.0430
0.0473	54.5530	3000	0.0430
0.0483	72.7373	4000	0.0447
0.0432	90.9217	5000	0.0442
0.0366	109.0922	6000	0.0450
0.0357	127.2765	7000	0.0471
0.0351	145.4608	8000	0.0476
0.0346	163.6452	9000	0.0477
0.0329	181.8295	10000	0.0493
0.0367	200.0	11000	0.0498
0.032	218.1843	12000	0.0507
0.0314	236.3687	13000	0.0500
0.0304	254.5530	14000	0.0518
0.0296	272.7373	15000	0.0516
0.0333	290.9217	16000	0.0517
0.0317	309.0922	17000	0.0549
0.0284	327.2765	18000	0.0549
0.0282	345.4608	19000	0.0538
0.0275	363.6452	20000	0.0540
0.0266	381.8295	21000	0.0540
0.0284	400.0	22000	0.0563
0.025	418.1843	23000	0.0544
0.025	436.3687	24000	0.0553
0.0259	454.5530	25000	0.0559
0.0248	472.7373	26000	0.0560
0.0245	490.9217	27000	0.0565
0.0239	509.0922	28000	0.0560
0.0235	527.2765	29000	0.0562
0.0245	545.4608	30000	0.0559
0.0235	563.6452	31000	0.0569
0.0237	581.8295	32000	0.0562
0.0232	600.0	33000	0.0558
0.0237	618.1843	34000	0.0570
0.0248	636.3687	35000	0.0562
0.0224	654.5530	36000	0.0570
0.0228	672.7373	37000	0.0560
0.0242	690.9217	38000	0.0556
0.023	709.0922	39000	0.0563
0.0244	727.2765	40000	0.0560

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

this model