mistral_groupsss_filall_numsym_no_empty_sft_newllamafactory

This model is a fine-tuned version of mistralai/Mistral-Nemo-Instruct-2407 on the 000groupsss_filall_numsym_no_empty_sft dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9856

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss
1.4942 0.0433 100 1.4158
1.2823 0.0867 200 1.2383
1.2408 0.1300 300 1.1942
1.185 0.1733 400 1.1651
1.1902 0.2166 500 1.1461
1.1418 0.2600 600 1.1323
1.1738 0.3033 700 1.1239
1.1129 0.3466 800 1.1103
1.1117 0.3899 900 1.1075
1.1128 0.4333 1000 1.0945
1.1025 0.4766 1100 1.0913
1.1023 0.5199 1200 1.0849
1.0983 0.5633 1300 1.0816
1.0929 0.6066 1400 1.0730
1.1097 0.6499 1500 1.0680
1.0669 0.6932 1600 1.0634
1.0844 0.7366 1700 1.0567
1.0804 0.7799 1800 1.0548
1.0869 0.8232 1900 1.0544
1.0694 0.8666 2000 1.0505
1.0951 0.9099 2100 1.0467
1.0776 0.9532 2200 1.0414
1.056 0.9965 2300 1.0378
1.0229 1.0399 2400 1.0387
1.0311 1.0832 2500 1.0343
1.0029 1.1265 2600 1.0276
1.056 1.1698 2700 1.0288
1.0261 1.2132 2800 1.0221
1.0077 1.2565 2900 1.0281
1.0133 1.2998 3000 1.0179
0.9893 1.3432 3100 1.0154
0.9919 1.3865 3200 1.0183
0.9867 1.4298 3300 1.0133
0.9982 1.4731 3400 1.0121
0.9869 1.5165 3500 1.0092
0.9708 1.5598 3600 1.0043
0.9753 1.6031 3700 1.0024
0.9943 1.6464 3800 1.0022
0.973 1.6898 3900 1.0018
0.9685 1.7331 4000 0.9978
0.9632 1.7764 4100 0.9978
0.9834 1.8198 4200 0.9954
0.9473 1.8631 4300 0.9947
0.9838 1.9064 4400 0.9923
0.9623 1.9497 4500 0.9927
0.9688 1.9931 4600 0.9908
0.9431 2.0364 4700 0.9928
0.9526 2.0797 4800 0.9908
0.9415 2.1231 4900 0.9921
0.9378 2.1664 5000 0.9903
0.9448 2.2097 5100 0.9893
0.953 2.2530 5200 0.9897
0.9218 2.2964 5300 0.9884
0.9147 2.3397 5400 0.9873
0.9352 2.3830 5500 0.9865
0.9085 2.4263 5600 0.9879
0.9366 2.4697 5700 0.9869
0.9063 2.5130 5800 0.9865
0.9511 2.5563 5900 0.9867
0.9445 2.5997 6000 0.9858
0.9453 2.6430 6100 0.9866
0.93 2.6863 6200 0.9864
0.9382 2.7296 6300 0.9862
0.9563 2.7730 6400 0.9855
0.9392 2.8163 6500 0.9855
0.939 2.8596 6600 0.9857
0.9171 2.9029 6700 0.9856
0.9217 2.9463 6800 0.9856
0.941 2.9896 6900 0.9856

Framework versions

  • PEFT 0.15.2
  • Transformers 4.57.3
  • Pytorch 2.9.1+cu128
  • Datasets 4.4.1
  • Tokenizers 0.22.2
Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for HYGGEhygge/saves