Training in progress, epoch 3, checkpoint

Browse files

Files changed (15) hide show

checkpoint-46761/README.md +202 -0
checkpoint-46761/adapter_config.json +32 -0
checkpoint-46761/adapter_model.safetensors +3 -0
checkpoint-46761/optimizer.pt +3 -0
checkpoint-46761/rng_state_0.pth +3 -0
checkpoint-46761/rng_state_1.pth +3 -0
checkpoint-46761/rng_state_2.pth +3 -0
checkpoint-46761/rng_state_3.pth +3 -0
checkpoint-46761/rng_state_4.pth +3 -0
checkpoint-46761/rng_state_5.pth +3 -0
checkpoint-46761/rng_state_6.pth +3 -0
checkpoint-46761/rng_state_7.pth +3 -0
checkpoint-46761/scheduler.pt +3 -0
checkpoint-46761/trainer_state.json +796 -0
checkpoint-46761/training_args.bin +3 -0

checkpoint-46761/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: deepseek-ai/deepseek-coder-1.3b-base
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.14.0

checkpoint-46761/adapter_config.json ADDED Viewed

	@@ -0,0 +1,32 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "deepseek-ai/deepseek-coder-1.3b-base",
+  "bias": "none",
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "q_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-46761/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2c4a6e6c845770f97a0b675fc7867fb42e4296891a49a289202f0da9f276d96d
+size 6304096

checkpoint-46761/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3b6dcaa91983ab3a135cc6bd97c61e3ddb1a5fda97bc8df90b85899945cb5e6c
+size 12663802

checkpoint-46761/rng_state_0.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:693f798184d95d81164de5abdaf8cc9570e314bc6efeeccaa19fc16b466ebf22
+size 15984

checkpoint-46761/rng_state_1.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d9c89b5e2bc0cdd34b6fa63a07ed0e6bcd9f5443470fb786bb32e37c10dc619b
+size 15984

checkpoint-46761/rng_state_2.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c0f4f5a9cefd8c114ee05aeec6e0a8f5bd12fb3986d21419f49d6ea5c18742ad
+size 15984

checkpoint-46761/rng_state_3.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:19b97381adf21c42e3b742cf1de27a2eb37f16b2beb2c690b2384098b6d83ce3
+size 15984

checkpoint-46761/rng_state_4.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4e8bdf506c77ee283ec4c8ca6284e9822a2a8d9cf1b413cfdf1c9bad7512a301
+size 15984

checkpoint-46761/rng_state_5.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:55c3a3263ad60808b235af1306e59ec65396295f497340ac06a9021733537934
+size 15984

checkpoint-46761/rng_state_6.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e6deb52fee223960e30fc8e77293b71714c1fed4d38f4ab65188e940f5fbd68d
+size 15984

checkpoint-46761/rng_state_7.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e116f68910660ab5e371d76068ea85e682d3d804b3882a9e601495deabbe3f1b
+size 15984

checkpoint-46761/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:978a382bd2c159ab49de71cd49496f82804b1840915bfd54b43a489ced08d6c3
+size 1064

checkpoint-46761/trainer_state.json ADDED Viewed

	@@ -0,0 +1,796 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 3.0,
+  "eval_steps": 3118,
+  "global_step": 46761,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.03207801372938988,
+      "grad_norm": 0.8564678430557251,
+      "learning_rate": 0.0007978657428198713,
+      "loss": 0.5444,
+      "step": 500
+    },
+    {
+      "epoch": 0.06415602745877975,
+      "grad_norm": 0.7482010722160339,
+      "learning_rate": 0.0007957314856397425,
+      "loss": 0.461,
+      "step": 1000
+    },
+    {
+      "epoch": 0.09623404118816963,
+      "grad_norm": 0.9299066662788391,
+      "learning_rate": 0.0007935929513911166,
+      "loss": 0.4262,
+      "step": 1500
+    },
+    {
+      "epoch": 0.1283120549175595,
+      "grad_norm": 0.742969810962677,
+      "learning_rate": 0.0007914544171424906,
+      "loss": 0.4121,
+      "step": 2000
+    },
+    {
+      "epoch": 0.16039006864694938,
+      "grad_norm": 1.0129729509353638,
+      "learning_rate": 0.0007893158828938645,
+      "loss": 0.3987,
+      "step": 2500
+    },
+    {
+      "epoch": 0.19246808237633925,
+      "grad_norm": 0.7069104313850403,
+      "learning_rate": 0.0007871816257137359,
+      "loss": 0.385,
+      "step": 3000
+    },
+    {
+      "epoch": 0.20003849361647527,
+      "eval_loss": 0.36575084924697876,
+      "eval_runtime": 5.7691,
+      "eval_samples_per_second": 86.668,
+      "eval_steps_per_second": 5.547,
+      "step": 3118
+    },
+    {
+      "epoch": 0.22454609610572912,
+      "grad_norm": 0.8256860375404358,
+      "learning_rate": 0.0007850473685336071,
+      "loss": 0.3829,
+      "step": 3500
+    },
+    {
+      "epoch": 0.256624109835119,
+      "grad_norm": 0.9610025882720947,
+      "learning_rate": 0.0007829088342849811,
+      "loss": 0.3745,
+      "step": 4000
+    },
+    {
+      "epoch": 0.28870212356450886,
+      "grad_norm": 0.9205407500267029,
+      "learning_rate": 0.0007807703000363551,
+      "loss": 0.3767,
+      "step": 4500
+    },
+    {
+      "epoch": 0.32078013729389876,
+      "grad_norm": 1.0542463064193726,
+      "learning_rate": 0.0007786317657877291,
+      "loss": 0.3699,
+      "step": 5000
+    },
+    {
+      "epoch": 0.35285815102328866,
+      "grad_norm": 0.8851079344749451,
+      "learning_rate": 0.0007764932315391031,
+      "loss": 0.3684,
+      "step": 5500
+    },
+    {
+      "epoch": 0.3849361647526785,
+      "grad_norm": 0.8109485507011414,
+      "learning_rate": 0.0007743546972904772,
+      "loss": 0.3677,
+      "step": 6000
+    },
+    {
+      "epoch": 0.40007698723295054,
+      "eval_loss": 0.34624484181404114,
+      "eval_runtime": 5.7422,
+      "eval_samples_per_second": 87.074,
+      "eval_steps_per_second": 5.573,
+      "step": 6236
+    },
+    {
+      "epoch": 0.4170141784820684,
+      "grad_norm": 0.8642650246620178,
+      "learning_rate": 0.0007722161630418511,
+      "loss": 0.3578,
+      "step": 6500
+    },
+    {
+      "epoch": 0.44909219221145824,
+      "grad_norm": 0.8750723600387573,
+      "learning_rate": 0.0007700776287932251,
+      "loss": 0.358,
+      "step": 7000
+    },
+    {
+      "epoch": 0.48117020594084814,
+      "grad_norm": 0.9213278889656067,
+      "learning_rate": 0.0007679390945445992,
+      "loss": 0.3568,
+      "step": 7500
+    },
+    {
+      "epoch": 0.513248219670238,
+      "grad_norm": 1.209458589553833,
+      "learning_rate": 0.0007658048373644704,
+      "loss": 0.3509,
+      "step": 8000
+    },
+    {
+      "epoch": 0.5453262333996279,
+      "grad_norm": 0.8048808574676514,
+      "learning_rate": 0.0007636663031158445,
+      "loss": 0.3491,
+      "step": 8500
+    },
+    {
+      "epoch": 0.5774042471290177,
+      "grad_norm": 0.8589063882827759,
+      "learning_rate": 0.0007615277688672184,
+      "loss": 0.3488,
+      "step": 9000
+    },
+    {
+      "epoch": 0.6001154808494258,
+      "eval_loss": 0.3319118916988373,
+      "eval_runtime": 5.941,
+      "eval_samples_per_second": 84.161,
+      "eval_steps_per_second": 5.386,
+      "step": 9354
+    },
+    {
+      "epoch": 0.6094822608584076,
+      "grad_norm": 1.1071431636810303,
+      "learning_rate": 0.0007593892346185924,
+      "loss": 0.3475,
+      "step": 9500
+    },
+    {
+      "epoch": 0.6415602745877975,
+      "grad_norm": 1.250051736831665,
+      "learning_rate": 0.0007572549774384637,
+      "loss": 0.3434,
+      "step": 10000
+    },
+    {
+      "epoch": 0.6736382883171874,
+      "grad_norm": 0.9173659682273865,
+      "learning_rate": 0.0007551164431898377,
+      "loss": 0.3425,
+      "step": 10500
+    },
+    {
+      "epoch": 0.7057163020465773,
+      "grad_norm": 0.9546225666999817,
+      "learning_rate": 0.000752982186009709,
+      "loss": 0.3415,
+      "step": 11000
+    },
+    {
+      "epoch": 0.7377943157759671,
+      "grad_norm": 0.756817102432251,
+      "learning_rate": 0.000750843651761083,
+      "loss": 0.3366,
+      "step": 11500
+    },
+    {
+      "epoch": 0.769872329505357,
+      "grad_norm": 0.7823662757873535,
+      "learning_rate": 0.0007487051175124569,
+      "loss": 0.3397,
+      "step": 12000
+    },
+    {
+      "epoch": 0.8001539744659011,
+      "eval_loss": 0.33135783672332764,
+      "eval_runtime": 5.8442,
+      "eval_samples_per_second": 85.554,
+      "eval_steps_per_second": 5.475,
+      "step": 12472
+    },
+    {
+      "epoch": 0.8019503432347469,
+      "grad_norm": 1.3129873275756836,
+      "learning_rate": 0.000746566583263831,
+      "loss": 0.3342,
+      "step": 12500
+    },
+    {
+      "epoch": 0.8340283569641368,
+      "grad_norm": 1.0603216886520386,
+      "learning_rate": 0.000744428049015205,
+      "loss": 0.3372,
+      "step": 13000
+    },
+    {
+      "epoch": 0.8661063706935267,
+      "grad_norm": 0.9776498079299927,
+      "learning_rate": 0.0007422895147665791,
+      "loss": 0.3343,
+      "step": 13500
+    },
+    {
+      "epoch": 0.8981843844229165,
+      "grad_norm": 0.9603497385978699,
+      "learning_rate": 0.000740150980517953,
+      "loss": 0.332,
+      "step": 14000
+    },
+    {
+      "epoch": 0.9302623981523064,
+      "grad_norm": 1.0065163373947144,
+      "learning_rate": 0.0007380124462693271,
+      "loss": 0.335,
+      "step": 14500
+    },
+    {
+      "epoch": 0.9623404118816963,
+      "grad_norm": 0.947246789932251,
+      "learning_rate": 0.0007358739120207011,
+      "loss": 0.3322,
+      "step": 15000
+    },
+    {
+      "epoch": 0.9944184256110862,
+      "grad_norm": 1.138590693473816,
+      "learning_rate": 0.0007337396548405722,
+      "loss": 0.3329,
+      "step": 15500
+    },
+    {
+      "epoch": 1.0001924680823764,
+      "eval_loss": 0.3191450238227844,
+      "eval_runtime": 5.9188,
+      "eval_samples_per_second": 84.477,
+      "eval_steps_per_second": 5.407,
+      "step": 15590
+    },
+    {
+      "epoch": 1.026496439340476,
+      "grad_norm": 1.0730034112930298,
+      "learning_rate": 0.0007316053976604436,
+      "loss": 0.3214,
+      "step": 16000
+    },
+    {
+      "epoch": 1.058574453069866,
+      "grad_norm": 1.155540108680725,
+      "learning_rate": 0.0007294668634118175,
+      "loss": 0.3203,
+      "step": 16500
+    },
+    {
+      "epoch": 1.0906524667992559,
+      "grad_norm": 1.322080373764038,
+      "learning_rate": 0.0007273283291631916,
+      "loss": 0.32,
+      "step": 17000
+    },
+    {
+      "epoch": 1.1227304805286458,
+      "grad_norm": 1.028536319732666,
+      "learning_rate": 0.0007251897949145656,
+      "loss": 0.3206,
+      "step": 17500
+    },
+    {
+      "epoch": 1.1548084942580354,
+      "grad_norm": 1.0141762495040894,
+      "learning_rate": 0.0007230512606659396,
+      "loss": 0.3246,
+      "step": 18000
+    },
+    {
+      "epoch": 1.1868865079874253,
+      "grad_norm": 1.2617709636688232,
+      "learning_rate": 0.0007209127264173135,
+      "loss": 0.3179,
+      "step": 18500
+    },
+    {
+      "epoch": 1.2002309616988516,
+      "eval_loss": 0.3040919303894043,
+      "eval_runtime": 5.8325,
+      "eval_samples_per_second": 85.727,
+      "eval_steps_per_second": 5.487,
+      "step": 18708
+    },
+    {
+      "epoch": 1.2189645217168152,
+      "grad_norm": 0.9643025398254395,
+      "learning_rate": 0.0007187741921686877,
+      "loss": 0.3131,
+      "step": 19000
+    },
+    {
+      "epoch": 1.2510425354462051,
+      "grad_norm": 0.8644528388977051,
+      "learning_rate": 0.0007166399349885588,
+      "loss": 0.3197,
+      "step": 19500
+    },
+    {
+      "epoch": 1.283120549175595,
+      "grad_norm": 1.0242154598236084,
+      "learning_rate": 0.000714501400739933,
+      "loss": 0.3189,
+      "step": 20000
+    },
+    {
+      "epoch": 1.315198562904985,
+      "grad_norm": 0.7361490726470947,
+      "learning_rate": 0.0007123628664913069,
+      "loss": 0.3147,
+      "step": 20500
+    },
+    {
+      "epoch": 1.3472765766343748,
+      "grad_norm": 0.9061699509620667,
+      "learning_rate": 0.0007102243322426809,
+      "loss": 0.3168,
+      "step": 21000
+    },
+    {
+      "epoch": 1.3793545903637647,
+      "grad_norm": 0.7674645781517029,
+      "learning_rate": 0.000708085797994055,
+      "loss": 0.3144,
+      "step": 21500
+    },
+    {
+      "epoch": 1.400269455315327,
+      "eval_loss": 0.303521990776062,
+      "eval_runtime": 5.9543,
+      "eval_samples_per_second": 83.973,
+      "eval_steps_per_second": 5.374,
+      "step": 21826
+    },
+    {
+      "epoch": 1.4114326040931546,
+      "grad_norm": 1.2573202848434448,
+      "learning_rate": 0.0007059472637454289,
+      "loss": 0.3182,
+      "step": 22000
+    },
+    {
+      "epoch": 1.4435106178225445,
+      "grad_norm": 0.7668033838272095,
+      "learning_rate": 0.0007038087294968029,
+      "loss": 0.3087,
+      "step": 22500
+    },
+    {
+      "epoch": 1.4755886315519344,
+      "grad_norm": 0.7923159003257751,
+      "learning_rate": 0.0007016701952481769,
+      "loss": 0.3136,
+      "step": 23000
+    },
+    {
+      "epoch": 1.5076666452813243,
+      "grad_norm": 0.9079853296279907,
+      "learning_rate": 0.000699531660999551,
+      "loss": 0.3136,
+      "step": 23500
+    },
+    {
+      "epoch": 1.5397446590107142,
+      "grad_norm": 0.807373583316803,
+      "learning_rate": 0.0006973974038194221,
+      "loss": 0.3129,
+      "step": 24000
+    },
+    {
+      "epoch": 1.571822672740104,
+      "grad_norm": 1.0894283056259155,
+      "learning_rate": 0.0006952588695707963,
+      "loss": 0.3122,
+      "step": 24500
+    },
+    {
+      "epoch": 1.6003079489318022,
+      "eval_loss": 0.30009227991104126,
+      "eval_runtime": 5.9543,
+      "eval_samples_per_second": 83.973,
+      "eval_steps_per_second": 5.374,
+      "step": 24944
+    },
+    {
+      "epoch": 1.6039006864694938,
+      "grad_norm": 0.8650055527687073,
+      "learning_rate": 0.0006931203353221702,
+      "loss": 0.3128,
+      "step": 25000
+    },
+    {
+      "epoch": 1.6359787001988837,
+      "grad_norm": 1.0704525709152222,
+      "learning_rate": 0.0006909818010735442,
+      "loss": 0.3152,
+      "step": 25500
+    },
+    {
+      "epoch": 1.6680567139282736,
+      "grad_norm": 1.6046242713928223,
+      "learning_rate": 0.0006888518209619127,
+      "loss": 0.3153,
+      "step": 26000
+    },
+    {
+      "epoch": 1.7001347276576635,
+      "grad_norm": 0.891106367111206,
+      "learning_rate": 0.0006867132867132868,
+      "loss": 0.3123,
+      "step": 26500
+    },
+    {
+      "epoch": 1.7322127413870532,
+      "grad_norm": 0.8591095805168152,
+      "learning_rate": 0.0006845747524646608,
+      "loss": 0.31,
+      "step": 27000
+    },
+    {
+      "epoch": 1.764290755116443,
+      "grad_norm": 0.8793129920959473,
+      "learning_rate": 0.0006824362182160348,
+      "loss": 0.3135,
+      "step": 27500
+    },
+    {
+      "epoch": 1.796368768845833,
+      "grad_norm": 0.9400936961174011,
+      "learning_rate": 0.0006802976839674088,
+      "loss": 0.3077,
+      "step": 28000
+    },
+    {
+      "epoch": 1.8003464425482774,
+      "eval_loss": 0.29524701833724976,
+      "eval_runtime": 5.9571,
+      "eval_samples_per_second": 83.934,
+      "eval_steps_per_second": 5.372,
+      "step": 28062
+    },
+    {
+      "epoch": 1.8284467825752229,
+      "grad_norm": 0.7908840775489807,
+      "learning_rate": 0.0006781591497187827,
+      "loss": 0.309,
+      "step": 28500
+    },
+    {
+      "epoch": 1.8605247963046128,
+      "grad_norm": 1.1478577852249146,
+      "learning_rate": 0.0006760206154701568,
+      "loss": 0.305,
+      "step": 29000
+    },
+    {
+      "epoch": 1.8926028100340027,
+      "grad_norm": 0.7777372598648071,
+      "learning_rate": 0.0006738820812215308,
+      "loss": 0.3092,
+      "step": 29500
+    },
+    {
+      "epoch": 1.9246808237633926,
+      "grad_norm": 0.8342514634132385,
+      "learning_rate": 0.000671747824041402,
+      "loss": 0.306,
+      "step": 30000
+    },
+    {
+      "epoch": 1.9567588374927825,
+      "grad_norm": 0.9895392060279846,
+      "learning_rate": 0.0006696092897927761,
+      "loss": 0.3128,
+      "step": 30500
+    },
+    {
+      "epoch": 1.9888368512221724,
+      "grad_norm": 1.0536723136901855,
+      "learning_rate": 0.0006674750326126473,
+      "loss": 0.3066,
+      "step": 31000
+    },
+    {
+      "epoch": 2.0003849361647528,
+      "eval_loss": 0.2923731803894043,
+      "eval_runtime": 6.0971,
+      "eval_samples_per_second": 82.007,
+      "eval_steps_per_second": 5.248,
+      "step": 31180
+    },
+    {
+      "epoch": 2.0209148649515623,
+      "grad_norm": 1.40830397605896,
+      "learning_rate": 0.0006653364983640213,
+      "loss": 0.2977,
+      "step": 31500
+    },
+    {
+      "epoch": 2.052992878680952,
+      "grad_norm": 1.0089466571807861,
+      "learning_rate": 0.0006631979641153954,
+      "loss": 0.2961,
+      "step": 32000
+    },
+    {
+      "epoch": 2.085070892410342,
+      "grad_norm": 0.854210376739502,
+      "learning_rate": 0.0006610594298667693,
+      "loss": 0.294,
+      "step": 32500
+    },
+    {
+      "epoch": 2.117148906139732,
+      "grad_norm": 1.0218485593795776,
+      "learning_rate": 0.0006589251726866407,
+      "loss": 0.2958,
+      "step": 33000
+    },
+    {
+      "epoch": 2.149226919869122,
+      "grad_norm": 0.9581003189086914,
+      "learning_rate": 0.0006567866384380146,
+      "loss": 0.2998,
+      "step": 33500
+    },
+    {
+      "epoch": 2.1813049335985117,
+      "grad_norm": 0.9771293997764587,
+      "learning_rate": 0.0006546481041893886,
+      "loss": 0.2954,
+      "step": 34000
+    },
+    {
+      "epoch": 2.200423429781228,
+      "eval_loss": 0.2844325006008148,
+      "eval_runtime": 5.9595,
+      "eval_samples_per_second": 83.9,
+      "eval_steps_per_second": 5.37,
+      "step": 34298
+    },
+    {
+      "epoch": 2.2133829473279016,
+      "grad_norm": 1.3172814846038818,
+      "learning_rate": 0.0006525095699407627,
+      "loss": 0.2971,
+      "step": 34500
+    },
+    {
+      "epoch": 2.2454609610572915,
+      "grad_norm": 1.260122537612915,
+      "learning_rate": 0.0006503710356921366,
+      "loss": 0.2931,
+      "step": 35000
+    },
+    {
+      "epoch": 2.2775389747866814,
+      "grad_norm": 0.8652594089508057,
+      "learning_rate": 0.0006482325014435106,
+      "loss": 0.2941,
+      "step": 35500
+    },
+    {
+      "epoch": 2.309616988516071,
+      "grad_norm": 0.9302785396575928,
+      "learning_rate": 0.0006460982442633819,
+      "loss": 0.2951,
+      "step": 36000
+    },
+    {
+      "epoch": 2.341695002245461,
+      "grad_norm": 1.0370172262191772,
+      "learning_rate": 0.0006439597100147559,
+      "loss": 0.2951,
+      "step": 36500
+    },
+    {
+      "epoch": 2.3737730159748507,
+      "grad_norm": 0.6764707565307617,
+      "learning_rate": 0.0006418211757661299,
+      "loss": 0.2946,
+      "step": 37000
+    },
+    {
+      "epoch": 2.400461923397703,
+      "eval_loss": 0.2868812382221222,
+      "eval_runtime": 5.9808,
+      "eval_samples_per_second": 83.601,
+      "eval_steps_per_second": 5.35,
+      "step": 37416
+    },
+    {
+      "epoch": 2.4058510297042406,
+      "grad_norm": 0.9105328917503357,
+      "learning_rate": 0.000639682641517504,
+      "loss": 0.2966,
+      "step": 37500
+    },
+    {
+      "epoch": 2.4379290434336305,
+      "grad_norm": 0.8954421281814575,
+      "learning_rate": 0.0006375441072688779,
+      "loss": 0.2949,
+      "step": 38000
+    },
+    {
+      "epoch": 2.4700070571630204,
+      "grad_norm": 0.798809826374054,
+      "learning_rate": 0.000635405573020252,
+      "loss": 0.2923,
+      "step": 38500
+    },
+    {
+      "epoch": 2.5020850708924103,
+      "grad_norm": 1.027869701385498,
+      "learning_rate": 0.000633267038771626,
+      "loss": 0.2988,
+      "step": 39000
+    },
+    {
+      "epoch": 2.5341630846218,
+      "grad_norm": 1.412424921989441,
+      "learning_rate": 0.000631128504523,
+      "loss": 0.2876,
+      "step": 39500
+    },
+    {
+      "epoch": 2.56624109835119,
+      "grad_norm": 0.8323147296905518,
+      "learning_rate": 0.0006289942473428712,
+      "loss": 0.2873,
+      "step": 40000
+    },
+    {
+      "epoch": 2.59831911208058,
+      "grad_norm": 1.2047405242919922,
+      "learning_rate": 0.0006268599901627425,
+      "loss": 0.2876,
+      "step": 40500
+    },
+    {
+      "epoch": 2.6005004170141786,
+      "eval_loss": 0.2851209044456482,
+      "eval_runtime": 5.8822,
+      "eval_samples_per_second": 85.002,
+      "eval_steps_per_second": 5.44,
+      "step": 40534
+    },
+    {
+      "epoch": 2.63039712580997,
+      "grad_norm": 0.9327086806297302,
+      "learning_rate": 0.0006247214559141165,
+      "loss": 0.2948,
+      "step": 41000
+    },
+    {
+      "epoch": 2.6624751395393598,
+      "grad_norm": 0.9470818638801575,
+      "learning_rate": 0.0006225829216654905,
+      "loss": 0.2909,
+      "step": 41500
+    },
+    {
+      "epoch": 2.6945531532687497,
+      "grad_norm": 1.1972421407699585,
+      "learning_rate": 0.0006204486644853617,
+      "loss": 0.2953,
+      "step": 42000
+    },
+    {
+      "epoch": 2.7266311669981396,
+      "grad_norm": 0.9601694345474243,
+      "learning_rate": 0.0006183101302367357,
+      "loss": 0.2901,
+      "step": 42500
+    },
+    {
+      "epoch": 2.7587091807275295,
+      "grad_norm": 0.796318531036377,
+      "learning_rate": 0.0006161715959881098,
+      "loss": 0.2879,
+      "step": 43000
+    },
+    {
+      "epoch": 2.7907871944569194,
+      "grad_norm": 1.1968493461608887,
+      "learning_rate": 0.0006140330617394838,
+      "loss": 0.2917,
+      "step": 43500
+    },
+    {
+      "epoch": 2.800538910630654,
+      "eval_loss": 0.2793387174606323,
+      "eval_runtime": 5.7736,
+      "eval_samples_per_second": 86.601,
+      "eval_steps_per_second": 5.542,
+      "step": 43652
+    },
+    {
+      "epoch": 2.8228652081863093,
+      "grad_norm": 0.9883773326873779,
+      "learning_rate": 0.0006118945274908578,
+      "loss": 0.2864,
+      "step": 44000
+    },
+    {
+      "epoch": 2.8549432219156987,
+      "grad_norm": 0.7262638807296753,
+      "learning_rate": 0.0006097559932422318,
+      "loss": 0.2867,
+      "step": 44500
+    },
+    {
+      "epoch": 2.887021235645089,
+      "grad_norm": 0.9277000427246094,
+      "learning_rate": 0.0006076174589936059,
+      "loss": 0.2901,
+      "step": 45000
+    },
+    {
+      "epoch": 2.9190992493744785,
+      "grad_norm": 0.9092797636985779,
+      "learning_rate": 0.0006054789247449798,
+      "loss": 0.289,
+      "step": 45500
+    },
+    {
+      "epoch": 2.951177263103869,
+      "grad_norm": 1.1064151525497437,
+      "learning_rate": 0.0006033403904963538,
+      "loss": 0.2925,
+      "step": 46000
+    },
+    {
+      "epoch": 2.9832552768332583,
+      "grad_norm": 1.2269039154052734,
+      "learning_rate": 0.0006012018562477279,
+      "loss": 0.284,
+      "step": 46500
+    }
+  ],
+  "logging_steps": 500,
+  "max_steps": 187044,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 12,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 3.143201808684417e+18,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-46761/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:37b90a284264b08902c1644c9f43994559f5b7a14e3b12bb5ba7570f7f5cdcae
+size 5496