mistral_groupsss_filall_numsym_no_empty_sft_newllamafactory

This model is a fine-tuned version of mistralai/Mistral-Nemo-Instruct-2407 on the 000groupsss_filall_numsym_no_empty_sft dataset. It achieves the following results on the evaluation set:

Loss: 0.9856

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 4
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss
1.4942	0.0433	100	1.4158
1.2823	0.0867	200	1.2383
1.2408	0.1300	300	1.1942
1.185	0.1733	400	1.1651
1.1902	0.2166	500	1.1461
1.1418	0.2600	600	1.1323
1.1738	0.3033	700	1.1239
1.1129	0.3466	800	1.1103
1.1117	0.3899	900	1.1075
1.1128	0.4333	1000	1.0945
1.1025	0.4766	1100	1.0913
1.1023	0.5199	1200	1.0849
1.0983	0.5633	1300	1.0816
1.0929	0.6066	1400	1.0730
1.1097	0.6499	1500	1.0680
1.0669	0.6932	1600	1.0634
1.0844	0.7366	1700	1.0567
1.0804	0.7799	1800	1.0548
1.0869	0.8232	1900	1.0544
1.0694	0.8666	2000	1.0505
1.0951	0.9099	2100	1.0467
1.0776	0.9532	2200	1.0414
1.056	0.9965	2300	1.0378
1.0229	1.0399	2400	1.0387
1.0311	1.0832	2500	1.0343
1.0029	1.1265	2600	1.0276
1.056	1.1698	2700	1.0288
1.0261	1.2132	2800	1.0221
1.0077	1.2565	2900	1.0281
1.0133	1.2998	3000	1.0179
0.9893	1.3432	3100	1.0154
0.9919	1.3865	3200	1.0183
0.9867	1.4298	3300	1.0133
0.9982	1.4731	3400	1.0121
0.9869	1.5165	3500	1.0092
0.9708	1.5598	3600	1.0043
0.9753	1.6031	3700	1.0024
0.9943	1.6464	3800	1.0022
0.973	1.6898	3900	1.0018
0.9685	1.7331	4000	0.9978
0.9632	1.7764	4100	0.9978
0.9834	1.8198	4200	0.9954
0.9473	1.8631	4300	0.9947
0.9838	1.9064	4400	0.9923
0.9623	1.9497	4500	0.9927
0.9688	1.9931	4600	0.9908
0.9431	2.0364	4700	0.9928
0.9526	2.0797	4800	0.9908
0.9415	2.1231	4900	0.9921
0.9378	2.1664	5000	0.9903
0.9448	2.2097	5100	0.9893
0.953	2.2530	5200	0.9897
0.9218	2.2964	5300	0.9884
0.9147	2.3397	5400	0.9873
0.9352	2.3830	5500	0.9865
0.9085	2.4263	5600	0.9879
0.9366	2.4697	5700	0.9869
0.9063	2.5130	5800	0.9865
0.9511	2.5563	5900	0.9867
0.9445	2.5997	6000	0.9858
0.9453	2.6430	6100	0.9866
0.93	2.6863	6200	0.9864
0.9382	2.7296	6300	0.9862
0.9563	2.7730	6400	0.9855
0.9392	2.8163	6500	0.9855
0.939	2.8596	6600	0.9857
0.9171	2.9029	6700	0.9856
0.9217	2.9463	6800	0.9856
0.941	2.9896	6900	0.9856

Framework versions

PEFT 0.15.2
Transformers 4.57.3
Pytorch 2.9.1+cu128
Datasets 4.4.1
Tokenizers 0.22.2

Downloads last month: 7

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for HYGGEhygge/saves

Base model

mistralai/Mistral-Nemo-Base-2407

Finetuned

mistralai/Mistral-Nemo-Instruct-2407

Adapter

(56)

this model