exp_004_base_multistage_urdu

This model is a fine-tuned version of sharjeel103/whisper-base-urdu on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 32
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 64
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
training_steps: 10000

Training Loss	Epoch	Step	Validation Loss	Wer	Wer Ortho	Cer
5.0263	0.6410	500	1.5195	95.4066	95.7445	33.6187
3.5885	1.2821	1000	0.9330	89.3228	90.2797	30.3680
3.0302	1.9231	1500	0.7431	82.6594	84.2995	27.0128
2.6802	2.5641	2000	0.6501	79.3803	81.3448	26.0944
2.4353	3.2051	2500	0.5907	76.0507	78.3277	24.5965
2.2853	3.8462	3000	0.5492	74.3293	76.6987	24.3492
2.1399	4.4872	3500	0.5184	72.7590	75.1236	23.5756
2.0097	5.1282	4000	0.4915	70.1810	72.8504	22.8648
1.9407	5.7692	4500	0.4697	69.6855	72.3975	22.6328
1.8625	6.4103	5000	0.4527	68.6736	71.3502	22.4363
1.7877	7.0513	5500	0.4359	68.1740	70.8474	22.0212
1.7349	7.6923	6000	0.4235	68.3125	71.0136	22.4257
1.6679	8.3333	6500	0.4130	66.2846	69.0105	21.4575
1.6108	8.9744	7000	0.4019	66.2804	69.1560	21.8611
1.5329	9.6154	7500	0.3947	65.7765	68.6697	21.6281
1.5234	10.2564	8000	0.3889	66.0621	68.8651	21.4800
1.5123	10.8974	8500	0.3841	66.1796	69.0396	21.5778
1.4779	11.5385	9000	0.3809	65.9949	68.8900	21.5524
1.4631	12.1795	9500	0.3787	65.9739	68.8277	21.4915
1.4256	12.8205	10000	0.3778	65.8857	68.7238	21.5002

Safetensors

Model size

72.6M params

Tensor type

F32

Base model

Finetuned

Finetuned

(1)

this model