---
library_name: peft
license: other
base_model: deepseek-ai/deepseek-coder-1.3b-base
tags:
- generated_from_trainer
model-index:
- name: lemexp-task1-v3-lemma_object_full-deepseek-coder-1.3b-base-8lr-12epochs-no-eos
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# lemexp-task1-v3-lemma_object_full-deepseek-coder-1.3b-base-8lr-12epochs-no-eos

This model is a fine-tuned version of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.1733

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0008
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 16
- total_eval_batch_size: 16
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 12
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch   | Step   | Validation Loss |
|:-------------:|:-------:|:------:|:---------------:|
| 0.3906        | 0.2000  | 3114   | 0.3862          |
| 0.3635        | 0.4000  | 6228   | 0.3630          |
| 0.3503        | 0.6000  | 9342   | 0.3475          |
| 0.3403        | 0.8001  | 12456  | 0.3313          |
| 0.3331        | 1.0001  | 15570  | 0.3307          |
| 0.3239        | 1.2001  | 18684  | 0.3182          |
| 0.3155        | 1.4001  | 21798  | 0.3090          |
| 0.3143        | 1.6001  | 24912  | 0.3098          |
| 0.3114        | 1.8001  | 28026  | 0.3078          |
| 0.305         | 2.0001  | 31140  | 0.3037          |
| 0.2934        | 2.2001  | 34254  | 0.2976          |
| 0.2983        | 2.4002  | 37368  | 0.2936          |
| 0.295         | 2.6002  | 40482  | 0.2963          |
| 0.2932        | 2.8002  | 43596  | 0.2853          |
| 0.2866        | 3.0002  | 46710  | 0.2792          |
| 0.2784        | 3.2002  | 49824  | 0.2809          |
| 0.276         | 3.4002  | 52938  | 0.2756          |
| 0.2727        | 3.6002  | 56052  | 0.2685          |
| 0.2704        | 3.8002  | 59166  | 0.2710          |
| 0.2711        | 4.0003  | 62280  | 0.2679          |
| 0.2601        | 4.2003  | 65394  | 0.2656          |
| 0.2574        | 4.4003  | 68508  | 0.2620          |
| 0.255         | 4.6003  | 71622  | 0.2549          |
| 0.2555        | 4.8003  | 74736  | 0.2529          |
| 0.2546        | 5.0003  | 77850  | 0.2499          |
| 0.2437        | 5.2003  | 80964  | 0.2487          |
| 0.2457        | 5.4003  | 84078  | 0.2448          |
| 0.2424        | 5.6004  | 87192  | 0.2412          |
| 0.236         | 5.8004  | 90306  | 0.2394          |
| 0.2378        | 6.0004  | 93420  | 0.2389          |
| 0.2286        | 6.2004  | 96534  | 0.2352          |
| 0.2288        | 6.4004  | 99648  | 0.2333          |
| 0.2276        | 6.6004  | 102762 | 0.2304          |
| 0.2233        | 6.8004  | 105876 | 0.2292          |
| 0.2247        | 7.0004  | 108990 | 0.2242          |
| 0.212         | 7.2005  | 112104 | 0.2226          |
| 0.2091        | 7.4005  | 115218 | 0.2231          |
| 0.2113        | 7.6005  | 118332 | 0.2188          |
| 0.2086        | 7.8005  | 121446 | 0.2154          |
| 0.2045        | 8.0005  | 124560 | 0.2106          |
| 0.1936        | 8.2005  | 127674 | 0.2120          |
| 0.1942        | 8.4005  | 130788 | 0.2052          |
| 0.194         | 8.6006  | 133902 | 0.2028          |
| 0.1971        | 8.8006  | 137016 | 0.2020          |
| 0.1878        | 9.0006  | 140130 | 0.1978          |
| 0.1787        | 9.2006  | 143244 | 0.1972          |
| 0.1788        | 9.4006  | 146358 | 0.1962          |
| 0.1768        | 9.6006  | 149472 | 0.1920          |
| 0.1741        | 9.8006  | 152586 | 0.1893          |
| 0.172         | 10.0006 | 155700 | 0.1889          |
| 0.1624        | 10.2007 | 158814 | 0.1881          |
| 0.1624        | 10.4007 | 161928 | 0.1874          |
| 0.1603        | 10.6007 | 165042 | 0.1828          |
| 0.1578        | 10.8007 | 168156 | 0.1791          |
| 0.1547        | 11.0007 | 171270 | 0.1796          |
| 0.1482        | 11.2007 | 174384 | 0.1781          |
| 0.1462        | 11.4007 | 177498 | 0.1764          |
| 0.1404        | 11.6007 | 180612 | 0.1747          |
| 0.1436        | 11.8008 | 183726 | 0.1733          |


### Framework versions

- PEFT 0.14.0
- Transformers 4.47.0
- Pytorch 2.5.1+cu124
- Datasets 4.2.0
- Tokenizers 0.21.0