mistral_groupsss_filall_numsym_no_empty_sft_newllamafactory
This model is a fine-tuned version of mistralai/Mistral-Nemo-Instruct-2407 on the 000groupsss_filall_numsym_no_empty_sft dataset. It achieves the following results on the evaluation set:
- Loss: 0.9856
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3.0
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 1.4942 | 0.0433 | 100 | 1.4158 |
| 1.2823 | 0.0867 | 200 | 1.2383 |
| 1.2408 | 0.1300 | 300 | 1.1942 |
| 1.185 | 0.1733 | 400 | 1.1651 |
| 1.1902 | 0.2166 | 500 | 1.1461 |
| 1.1418 | 0.2600 | 600 | 1.1323 |
| 1.1738 | 0.3033 | 700 | 1.1239 |
| 1.1129 | 0.3466 | 800 | 1.1103 |
| 1.1117 | 0.3899 | 900 | 1.1075 |
| 1.1128 | 0.4333 | 1000 | 1.0945 |
| 1.1025 | 0.4766 | 1100 | 1.0913 |
| 1.1023 | 0.5199 | 1200 | 1.0849 |
| 1.0983 | 0.5633 | 1300 | 1.0816 |
| 1.0929 | 0.6066 | 1400 | 1.0730 |
| 1.1097 | 0.6499 | 1500 | 1.0680 |
| 1.0669 | 0.6932 | 1600 | 1.0634 |
| 1.0844 | 0.7366 | 1700 | 1.0567 |
| 1.0804 | 0.7799 | 1800 | 1.0548 |
| 1.0869 | 0.8232 | 1900 | 1.0544 |
| 1.0694 | 0.8666 | 2000 | 1.0505 |
| 1.0951 | 0.9099 | 2100 | 1.0467 |
| 1.0776 | 0.9532 | 2200 | 1.0414 |
| 1.056 | 0.9965 | 2300 | 1.0378 |
| 1.0229 | 1.0399 | 2400 | 1.0387 |
| 1.0311 | 1.0832 | 2500 | 1.0343 |
| 1.0029 | 1.1265 | 2600 | 1.0276 |
| 1.056 | 1.1698 | 2700 | 1.0288 |
| 1.0261 | 1.2132 | 2800 | 1.0221 |
| 1.0077 | 1.2565 | 2900 | 1.0281 |
| 1.0133 | 1.2998 | 3000 | 1.0179 |
| 0.9893 | 1.3432 | 3100 | 1.0154 |
| 0.9919 | 1.3865 | 3200 | 1.0183 |
| 0.9867 | 1.4298 | 3300 | 1.0133 |
| 0.9982 | 1.4731 | 3400 | 1.0121 |
| 0.9869 | 1.5165 | 3500 | 1.0092 |
| 0.9708 | 1.5598 | 3600 | 1.0043 |
| 0.9753 | 1.6031 | 3700 | 1.0024 |
| 0.9943 | 1.6464 | 3800 | 1.0022 |
| 0.973 | 1.6898 | 3900 | 1.0018 |
| 0.9685 | 1.7331 | 4000 | 0.9978 |
| 0.9632 | 1.7764 | 4100 | 0.9978 |
| 0.9834 | 1.8198 | 4200 | 0.9954 |
| 0.9473 | 1.8631 | 4300 | 0.9947 |
| 0.9838 | 1.9064 | 4400 | 0.9923 |
| 0.9623 | 1.9497 | 4500 | 0.9927 |
| 0.9688 | 1.9931 | 4600 | 0.9908 |
| 0.9431 | 2.0364 | 4700 | 0.9928 |
| 0.9526 | 2.0797 | 4800 | 0.9908 |
| 0.9415 | 2.1231 | 4900 | 0.9921 |
| 0.9378 | 2.1664 | 5000 | 0.9903 |
| 0.9448 | 2.2097 | 5100 | 0.9893 |
| 0.953 | 2.2530 | 5200 | 0.9897 |
| 0.9218 | 2.2964 | 5300 | 0.9884 |
| 0.9147 | 2.3397 | 5400 | 0.9873 |
| 0.9352 | 2.3830 | 5500 | 0.9865 |
| 0.9085 | 2.4263 | 5600 | 0.9879 |
| 0.9366 | 2.4697 | 5700 | 0.9869 |
| 0.9063 | 2.5130 | 5800 | 0.9865 |
| 0.9511 | 2.5563 | 5900 | 0.9867 |
| 0.9445 | 2.5997 | 6000 | 0.9858 |
| 0.9453 | 2.6430 | 6100 | 0.9866 |
| 0.93 | 2.6863 | 6200 | 0.9864 |
| 0.9382 | 2.7296 | 6300 | 0.9862 |
| 0.9563 | 2.7730 | 6400 | 0.9855 |
| 0.9392 | 2.8163 | 6500 | 0.9855 |
| 0.939 | 2.8596 | 6600 | 0.9857 |
| 0.9171 | 2.9029 | 6700 | 0.9856 |
| 0.9217 | 2.9463 | 6800 | 0.9856 |
| 0.941 | 2.9896 | 6900 | 0.9856 |
Framework versions
- PEFT 0.15.2
- Transformers 4.57.3
- Pytorch 2.9.1+cu128
- Datasets 4.4.1
- Tokenizers 0.22.2
- Downloads last month
- 7
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for HYGGEhygge/saves
Base model
mistralai/Mistral-Nemo-Base-2407
Finetuned
mistralai/Mistral-Nemo-Instruct-2407