Training in progress, epoch 5, checkpoint

Browse files

Files changed (15) hide show

checkpoint-77935/README.md +202 -0
checkpoint-77935/adapter_config.json +32 -0
checkpoint-77935/adapter_model.safetensors +3 -0
checkpoint-77935/optimizer.pt +3 -0
checkpoint-77935/rng_state_0.pth +3 -0
checkpoint-77935/rng_state_1.pth +3 -0
checkpoint-77935/rng_state_2.pth +3 -0
checkpoint-77935/rng_state_3.pth +3 -0
checkpoint-77935/rng_state_4.pth +3 -0
checkpoint-77935/rng_state_5.pth +3 -0
checkpoint-77935/rng_state_6.pth +3 -0
checkpoint-77935/rng_state_7.pth +3 -0
checkpoint-77935/scheduler.pt +3 -0
checkpoint-77935/trainer_state.json +1310 -0
checkpoint-77935/training_args.bin +3 -0

checkpoint-77935/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: deepseek-ai/deepseek-coder-1.3b-base
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.14.0

checkpoint-77935/adapter_config.json ADDED Viewed

	@@ -0,0 +1,32 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "deepseek-ai/deepseek-coder-1.3b-base",
+  "bias": "none",
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "q_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-77935/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:90c937b1ace3d82aea3f5e6f49860052475bf46453933dc31418c5c31455d08c
+size 6304096

checkpoint-77935/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:52e2cbf94bb46158e43e1ace644eef96a3504190e7a3630e2f7c847fd6ad2a18
+size 12663802

checkpoint-77935/rng_state_0.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3a8a4d2360e35d4d85cb41db334c14b1b9847ea5f3647b6295f51c51efafd049
+size 15984

checkpoint-77935/rng_state_1.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8a9bb38bb308ef3bf8b965ad30339b3ce3bb82e5bc241ae867161e6240cd8c27
+size 15984

checkpoint-77935/rng_state_2.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c7ba3c84940bb3eefe51970c5ec15a18f9cffb17489013c72edad6eac6fc9c59
+size 15984

checkpoint-77935/rng_state_3.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1c9dce5f6893a8dcbdf2e5dd29a4e985dd8878f58ba0367f7349d4059030fa11
+size 15984

checkpoint-77935/rng_state_4.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0be3e6406c3b95455963d008b499949c290aba5664f058767310dcf8cf518bb6
+size 15984

checkpoint-77935/rng_state_5.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5d0dcfdcff002cf15b3e419666a85940516439de0cde173d4204bb1f1f8f45cf
+size 15984

checkpoint-77935/rng_state_6.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8c31ba2ef6534c09ba67dc01c9f1f1246b7165f8bbe87096bf500d330ebd4d88
+size 15984

checkpoint-77935/rng_state_7.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:71d1de5498d7f750cda1f4a102b93845faee5cb4b48fa9d729f9691964e789d5
+size 15984

checkpoint-77935/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cce3b30420036d6a98353303d6ab8b5347197b80b8d705984f1955b6dc33e2b2
+size 1064

checkpoint-77935/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1310 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 5.0,
+  "eval_steps": 3118,
+  "global_step": 77935,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.03207801372938988,
+      "grad_norm": 0.8564678430557251,
+      "learning_rate": 0.0007978657428198713,
+      "loss": 0.5444,
+      "step": 500
+    },
+    {
+      "epoch": 0.06415602745877975,
+      "grad_norm": 0.7482010722160339,
+      "learning_rate": 0.0007957314856397425,
+      "loss": 0.461,
+      "step": 1000
+    },
+    {
+      "epoch": 0.09623404118816963,
+      "grad_norm": 0.9299066662788391,
+      "learning_rate": 0.0007935929513911166,
+      "loss": 0.4262,
+      "step": 1500
+    },
+    {
+      "epoch": 0.1283120549175595,
+      "grad_norm": 0.742969810962677,
+      "learning_rate": 0.0007914544171424906,
+      "loss": 0.4121,
+      "step": 2000
+    },
+    {
+      "epoch": 0.16039006864694938,
+      "grad_norm": 1.0129729509353638,
+      "learning_rate": 0.0007893158828938645,
+      "loss": 0.3987,
+      "step": 2500
+    },
+    {
+      "epoch": 0.19246808237633925,
+      "grad_norm": 0.7069104313850403,
+      "learning_rate": 0.0007871816257137359,
+      "loss": 0.385,
+      "step": 3000
+    },
+    {
+      "epoch": 0.20003849361647527,
+      "eval_loss": 0.36575084924697876,
+      "eval_runtime": 5.7691,
+      "eval_samples_per_second": 86.668,
+      "eval_steps_per_second": 5.547,
+      "step": 3118
+    },
+    {
+      "epoch": 0.22454609610572912,
+      "grad_norm": 0.8256860375404358,
+      "learning_rate": 0.0007850473685336071,
+      "loss": 0.3829,
+      "step": 3500
+    },
+    {
+      "epoch": 0.256624109835119,
+      "grad_norm": 0.9610025882720947,
+      "learning_rate": 0.0007829088342849811,
+      "loss": 0.3745,
+      "step": 4000
+    },
+    {
+      "epoch": 0.28870212356450886,
+      "grad_norm": 0.9205407500267029,
+      "learning_rate": 0.0007807703000363551,
+      "loss": 0.3767,
+      "step": 4500
+    },
+    {
+      "epoch": 0.32078013729389876,
+      "grad_norm": 1.0542463064193726,
+      "learning_rate": 0.0007786317657877291,
+      "loss": 0.3699,
+      "step": 5000
+    },
+    {
+      "epoch": 0.35285815102328866,
+      "grad_norm": 0.8851079344749451,
+      "learning_rate": 0.0007764932315391031,
+      "loss": 0.3684,
+      "step": 5500
+    },
+    {
+      "epoch": 0.3849361647526785,
+      "grad_norm": 0.8109485507011414,
+      "learning_rate": 0.0007743546972904772,
+      "loss": 0.3677,
+      "step": 6000
+    },
+    {
+      "epoch": 0.40007698723295054,
+      "eval_loss": 0.34624484181404114,
+      "eval_runtime": 5.7422,
+      "eval_samples_per_second": 87.074,
+      "eval_steps_per_second": 5.573,
+      "step": 6236
+    },
+    {
+      "epoch": 0.4170141784820684,
+      "grad_norm": 0.8642650246620178,
+      "learning_rate": 0.0007722161630418511,
+      "loss": 0.3578,
+      "step": 6500
+    },
+    {
+      "epoch": 0.44909219221145824,
+      "grad_norm": 0.8750723600387573,
+      "learning_rate": 0.0007700776287932251,
+      "loss": 0.358,
+      "step": 7000
+    },
+    {
+      "epoch": 0.48117020594084814,
+      "grad_norm": 0.9213278889656067,
+      "learning_rate": 0.0007679390945445992,
+      "loss": 0.3568,
+      "step": 7500
+    },
+    {
+      "epoch": 0.513248219670238,
+      "grad_norm": 1.209458589553833,
+      "learning_rate": 0.0007658048373644704,
+      "loss": 0.3509,
+      "step": 8000
+    },
+    {
+      "epoch": 0.5453262333996279,
+      "grad_norm": 0.8048808574676514,
+      "learning_rate": 0.0007636663031158445,
+      "loss": 0.3491,
+      "step": 8500
+    },
+    {
+      "epoch": 0.5774042471290177,
+      "grad_norm": 0.8589063882827759,
+      "learning_rate": 0.0007615277688672184,
+      "loss": 0.3488,
+      "step": 9000
+    },
+    {
+      "epoch": 0.6001154808494258,
+      "eval_loss": 0.3319118916988373,
+      "eval_runtime": 5.941,
+      "eval_samples_per_second": 84.161,
+      "eval_steps_per_second": 5.386,
+      "step": 9354
+    },
+    {
+      "epoch": 0.6094822608584076,
+      "grad_norm": 1.1071431636810303,
+      "learning_rate": 0.0007593892346185924,
+      "loss": 0.3475,
+      "step": 9500
+    },
+    {
+      "epoch": 0.6415602745877975,
+      "grad_norm": 1.250051736831665,
+      "learning_rate": 0.0007572549774384637,
+      "loss": 0.3434,
+      "step": 10000
+    },
+    {
+      "epoch": 0.6736382883171874,
+      "grad_norm": 0.9173659682273865,
+      "learning_rate": 0.0007551164431898377,
+      "loss": 0.3425,
+      "step": 10500
+    },
+    {
+      "epoch": 0.7057163020465773,
+      "grad_norm": 0.9546225666999817,
+      "learning_rate": 0.000752982186009709,
+      "loss": 0.3415,
+      "step": 11000
+    },
+    {
+      "epoch": 0.7377943157759671,
+      "grad_norm": 0.756817102432251,
+      "learning_rate": 0.000750843651761083,
+      "loss": 0.3366,
+      "step": 11500
+    },
+    {
+      "epoch": 0.769872329505357,
+      "grad_norm": 0.7823662757873535,
+      "learning_rate": 0.0007487051175124569,
+      "loss": 0.3397,
+      "step": 12000
+    },
+    {
+      "epoch": 0.8001539744659011,
+      "eval_loss": 0.33135783672332764,
+      "eval_runtime": 5.8442,
+      "eval_samples_per_second": 85.554,
+      "eval_steps_per_second": 5.475,
+      "step": 12472
+    },
+    {
+      "epoch": 0.8019503432347469,
+      "grad_norm": 1.3129873275756836,
+      "learning_rate": 0.000746566583263831,
+      "loss": 0.3342,
+      "step": 12500
+    },
+    {
+      "epoch": 0.8340283569641368,
+      "grad_norm": 1.0603216886520386,
+      "learning_rate": 0.000744428049015205,
+      "loss": 0.3372,
+      "step": 13000
+    },
+    {
+      "epoch": 0.8661063706935267,
+      "grad_norm": 0.9776498079299927,
+      "learning_rate": 0.0007422895147665791,
+      "loss": 0.3343,
+      "step": 13500
+    },
+    {
+      "epoch": 0.8981843844229165,
+      "grad_norm": 0.9603497385978699,
+      "learning_rate": 0.000740150980517953,
+      "loss": 0.332,
+      "step": 14000
+    },
+    {
+      "epoch": 0.9302623981523064,
+      "grad_norm": 1.0065163373947144,
+      "learning_rate": 0.0007380124462693271,
+      "loss": 0.335,
+      "step": 14500
+    },
+    {
+      "epoch": 0.9623404118816963,
+      "grad_norm": 0.947246789932251,
+      "learning_rate": 0.0007358739120207011,
+      "loss": 0.3322,
+      "step": 15000
+    },
+    {
+      "epoch": 0.9944184256110862,
+      "grad_norm": 1.138590693473816,
+      "learning_rate": 0.0007337396548405722,
+      "loss": 0.3329,
+      "step": 15500
+    },
+    {
+      "epoch": 1.0001924680823764,
+      "eval_loss": 0.3191450238227844,
+      "eval_runtime": 5.9188,
+      "eval_samples_per_second": 84.477,
+      "eval_steps_per_second": 5.407,
+      "step": 15590
+    },
+    {
+      "epoch": 1.026496439340476,
+      "grad_norm": 1.0730034112930298,
+      "learning_rate": 0.0007316053976604436,
+      "loss": 0.3214,
+      "step": 16000
+    },
+    {
+      "epoch": 1.058574453069866,
+      "grad_norm": 1.155540108680725,
+      "learning_rate": 0.0007294668634118175,
+      "loss": 0.3203,
+      "step": 16500
+    },
+    {
+      "epoch": 1.0906524667992559,
+      "grad_norm": 1.322080373764038,
+      "learning_rate": 0.0007273283291631916,
+      "loss": 0.32,
+      "step": 17000
+    },
+    {
+      "epoch": 1.1227304805286458,
+      "grad_norm": 1.028536319732666,
+      "learning_rate": 0.0007251897949145656,
+      "loss": 0.3206,
+      "step": 17500
+    },
+    {
+      "epoch": 1.1548084942580354,
+      "grad_norm": 1.0141762495040894,
+      "learning_rate": 0.0007230512606659396,
+      "loss": 0.3246,
+      "step": 18000
+    },
+    {
+      "epoch": 1.1868865079874253,
+      "grad_norm": 1.2617709636688232,
+      "learning_rate": 0.0007209127264173135,
+      "loss": 0.3179,
+      "step": 18500
+    },
+    {
+      "epoch": 1.2002309616988516,
+      "eval_loss": 0.3040919303894043,
+      "eval_runtime": 5.8325,
+      "eval_samples_per_second": 85.727,
+      "eval_steps_per_second": 5.487,
+      "step": 18708
+    },
+    {
+      "epoch": 1.2189645217168152,
+      "grad_norm": 0.9643025398254395,
+      "learning_rate": 0.0007187741921686877,
+      "loss": 0.3131,
+      "step": 19000
+    },
+    {
+      "epoch": 1.2510425354462051,
+      "grad_norm": 0.8644528388977051,
+      "learning_rate": 0.0007166399349885588,
+      "loss": 0.3197,
+      "step": 19500
+    },
+    {
+      "epoch": 1.283120549175595,
+      "grad_norm": 1.0242154598236084,
+      "learning_rate": 0.000714501400739933,
+      "loss": 0.3189,
+      "step": 20000
+    },
+    {
+      "epoch": 1.315198562904985,
+      "grad_norm": 0.7361490726470947,
+      "learning_rate": 0.0007123628664913069,
+      "loss": 0.3147,
+      "step": 20500
+    },
+    {
+      "epoch": 1.3472765766343748,
+      "grad_norm": 0.9061699509620667,
+      "learning_rate": 0.0007102243322426809,
+      "loss": 0.3168,
+      "step": 21000
+    },
+    {
+      "epoch": 1.3793545903637647,
+      "grad_norm": 0.7674645781517029,
+      "learning_rate": 0.000708085797994055,
+      "loss": 0.3144,
+      "step": 21500
+    },
+    {
+      "epoch": 1.400269455315327,
+      "eval_loss": 0.303521990776062,
+      "eval_runtime": 5.9543,
+      "eval_samples_per_second": 83.973,
+      "eval_steps_per_second": 5.374,
+      "step": 21826
+    },
+    {
+      "epoch": 1.4114326040931546,
+      "grad_norm": 1.2573202848434448,
+      "learning_rate": 0.0007059472637454289,
+      "loss": 0.3182,
+      "step": 22000
+    },
+    {
+      "epoch": 1.4435106178225445,
+      "grad_norm": 0.7668033838272095,
+      "learning_rate": 0.0007038087294968029,
+      "loss": 0.3087,
+      "step": 22500
+    },
+    {
+      "epoch": 1.4755886315519344,
+      "grad_norm": 0.7923159003257751,
+      "learning_rate": 0.0007016701952481769,
+      "loss": 0.3136,
+      "step": 23000
+    },
+    {
+      "epoch": 1.5076666452813243,
+      "grad_norm": 0.9079853296279907,
+      "learning_rate": 0.000699531660999551,
+      "loss": 0.3136,
+      "step": 23500
+    },
+    {
+      "epoch": 1.5397446590107142,
+      "grad_norm": 0.807373583316803,
+      "learning_rate": 0.0006973974038194221,
+      "loss": 0.3129,
+      "step": 24000
+    },
+    {
+      "epoch": 1.571822672740104,
+      "grad_norm": 1.0894283056259155,
+      "learning_rate": 0.0006952588695707963,
+      "loss": 0.3122,
+      "step": 24500
+    },
+    {
+      "epoch": 1.6003079489318022,
+      "eval_loss": 0.30009227991104126,
+      "eval_runtime": 5.9543,
+      "eval_samples_per_second": 83.973,
+      "eval_steps_per_second": 5.374,
+      "step": 24944
+    },
+    {
+      "epoch": 1.6039006864694938,
+      "grad_norm": 0.8650055527687073,
+      "learning_rate": 0.0006931203353221702,
+      "loss": 0.3128,
+      "step": 25000
+    },
+    {
+      "epoch": 1.6359787001988837,
+      "grad_norm": 1.0704525709152222,
+      "learning_rate": 0.0006909818010735442,
+      "loss": 0.3152,
+      "step": 25500
+    },
+    {
+      "epoch": 1.6680567139282736,
+      "grad_norm": 1.6046242713928223,
+      "learning_rate": 0.0006888518209619127,
+      "loss": 0.3153,
+      "step": 26000
+    },
+    {
+      "epoch": 1.7001347276576635,
+      "grad_norm": 0.891106367111206,
+      "learning_rate": 0.0006867132867132868,
+      "loss": 0.3123,
+      "step": 26500
+    },
+    {
+      "epoch": 1.7322127413870532,
+      "grad_norm": 0.8591095805168152,
+      "learning_rate": 0.0006845747524646608,
+      "loss": 0.31,
+      "step": 27000
+    },
+    {
+      "epoch": 1.764290755116443,
+      "grad_norm": 0.8793129920959473,
+      "learning_rate": 0.0006824362182160348,
+      "loss": 0.3135,
+      "step": 27500
+    },
+    {
+      "epoch": 1.796368768845833,
+      "grad_norm": 0.9400936961174011,
+      "learning_rate": 0.0006802976839674088,
+      "loss": 0.3077,
+      "step": 28000
+    },
+    {
+      "epoch": 1.8003464425482774,
+      "eval_loss": 0.29524701833724976,
+      "eval_runtime": 5.9571,
+      "eval_samples_per_second": 83.934,
+      "eval_steps_per_second": 5.372,
+      "step": 28062
+    },
+    {
+      "epoch": 1.8284467825752229,
+      "grad_norm": 0.7908840775489807,
+      "learning_rate": 0.0006781591497187827,
+      "loss": 0.309,
+      "step": 28500
+    },
+    {
+      "epoch": 1.8605247963046128,
+      "grad_norm": 1.1478577852249146,
+      "learning_rate": 0.0006760206154701568,
+      "loss": 0.305,
+      "step": 29000
+    },
+    {
+      "epoch": 1.8926028100340027,
+      "grad_norm": 0.7777372598648071,
+      "learning_rate": 0.0006738820812215308,
+      "loss": 0.3092,
+      "step": 29500
+    },
+    {
+      "epoch": 1.9246808237633926,
+      "grad_norm": 0.8342514634132385,
+      "learning_rate": 0.000671747824041402,
+      "loss": 0.306,
+      "step": 30000
+    },
+    {
+      "epoch": 1.9567588374927825,
+      "grad_norm": 0.9895392060279846,
+      "learning_rate": 0.0006696092897927761,
+      "loss": 0.3128,
+      "step": 30500
+    },
+    {
+      "epoch": 1.9888368512221724,
+      "grad_norm": 1.0536723136901855,
+      "learning_rate": 0.0006674750326126473,
+      "loss": 0.3066,
+      "step": 31000
+    },
+    {
+      "epoch": 2.0003849361647528,
+      "eval_loss": 0.2923731803894043,
+      "eval_runtime": 6.0971,
+      "eval_samples_per_second": 82.007,
+      "eval_steps_per_second": 5.248,
+      "step": 31180
+    },
+    {
+      "epoch": 2.0209148649515623,
+      "grad_norm": 1.40830397605896,
+      "learning_rate": 0.0006653364983640213,
+      "loss": 0.2977,
+      "step": 31500
+    },
+    {
+      "epoch": 2.052992878680952,
+      "grad_norm": 1.0089466571807861,
+      "learning_rate": 0.0006631979641153954,
+      "loss": 0.2961,
+      "step": 32000
+    },
+    {
+      "epoch": 2.085070892410342,
+      "grad_norm": 0.854210376739502,
+      "learning_rate": 0.0006610594298667693,
+      "loss": 0.294,
+      "step": 32500
+    },
+    {
+      "epoch": 2.117148906139732,
+      "grad_norm": 1.0218485593795776,
+      "learning_rate": 0.0006589251726866407,
+      "loss": 0.2958,
+      "step": 33000
+    },
+    {
+      "epoch": 2.149226919869122,
+      "grad_norm": 0.9581003189086914,
+      "learning_rate": 0.0006567866384380146,
+      "loss": 0.2998,
+      "step": 33500
+    },
+    {
+      "epoch": 2.1813049335985117,
+      "grad_norm": 0.9771293997764587,
+      "learning_rate": 0.0006546481041893886,
+      "loss": 0.2954,
+      "step": 34000
+    },
+    {
+      "epoch": 2.200423429781228,
+      "eval_loss": 0.2844325006008148,
+      "eval_runtime": 5.9595,
+      "eval_samples_per_second": 83.9,
+      "eval_steps_per_second": 5.37,
+      "step": 34298
+    },
+    {
+      "epoch": 2.2133829473279016,
+      "grad_norm": 1.3172814846038818,
+      "learning_rate": 0.0006525095699407627,
+      "loss": 0.2971,
+      "step": 34500
+    },
+    {
+      "epoch": 2.2454609610572915,
+      "grad_norm": 1.260122537612915,
+      "learning_rate": 0.0006503710356921366,
+      "loss": 0.2931,
+      "step": 35000
+    },
+    {
+      "epoch": 2.2775389747866814,
+      "grad_norm": 0.8652594089508057,
+      "learning_rate": 0.0006482325014435106,
+      "loss": 0.2941,
+      "step": 35500
+    },
+    {
+      "epoch": 2.309616988516071,
+      "grad_norm": 0.9302785396575928,
+      "learning_rate": 0.0006460982442633819,
+      "loss": 0.2951,
+      "step": 36000
+    },
+    {
+      "epoch": 2.341695002245461,
+      "grad_norm": 1.0370172262191772,
+      "learning_rate": 0.0006439597100147559,
+      "loss": 0.2951,
+      "step": 36500
+    },
+    {
+      "epoch": 2.3737730159748507,
+      "grad_norm": 0.6764707565307617,
+      "learning_rate": 0.0006418211757661299,
+      "loss": 0.2946,
+      "step": 37000
+    },
+    {
+      "epoch": 2.400461923397703,
+      "eval_loss": 0.2868812382221222,
+      "eval_runtime": 5.9808,
+      "eval_samples_per_second": 83.601,
+      "eval_steps_per_second": 5.35,
+      "step": 37416
+    },
+    {
+      "epoch": 2.4058510297042406,
+      "grad_norm": 0.9105328917503357,
+      "learning_rate": 0.000639682641517504,
+      "loss": 0.2966,
+      "step": 37500
+    },
+    {
+      "epoch": 2.4379290434336305,
+      "grad_norm": 0.8954421281814575,
+      "learning_rate": 0.0006375441072688779,
+      "loss": 0.2949,
+      "step": 38000
+    },
+    {
+      "epoch": 2.4700070571630204,
+      "grad_norm": 0.798809826374054,
+      "learning_rate": 0.000635405573020252,
+      "loss": 0.2923,
+      "step": 38500
+    },
+    {
+      "epoch": 2.5020850708924103,
+      "grad_norm": 1.027869701385498,
+      "learning_rate": 0.000633267038771626,
+      "loss": 0.2988,
+      "step": 39000
+    },
+    {
+      "epoch": 2.5341630846218,
+      "grad_norm": 1.412424921989441,
+      "learning_rate": 0.000631128504523,
+      "loss": 0.2876,
+      "step": 39500
+    },
+    {
+      "epoch": 2.56624109835119,
+      "grad_norm": 0.8323147296905518,
+      "learning_rate": 0.0006289942473428712,
+      "loss": 0.2873,
+      "step": 40000
+    },
+    {
+      "epoch": 2.59831911208058,
+      "grad_norm": 1.2047405242919922,
+      "learning_rate": 0.0006268599901627425,
+      "loss": 0.2876,
+      "step": 40500
+    },
+    {
+      "epoch": 2.6005004170141786,
+      "eval_loss": 0.2851209044456482,
+      "eval_runtime": 5.8822,
+      "eval_samples_per_second": 85.002,
+      "eval_steps_per_second": 5.44,
+      "step": 40534
+    },
+    {
+      "epoch": 2.63039712580997,
+      "grad_norm": 0.9327086806297302,
+      "learning_rate": 0.0006247214559141165,
+      "loss": 0.2948,
+      "step": 41000
+    },
+    {
+      "epoch": 2.6624751395393598,
+      "grad_norm": 0.9470818638801575,
+      "learning_rate": 0.0006225829216654905,
+      "loss": 0.2909,
+      "step": 41500
+    },
+    {
+      "epoch": 2.6945531532687497,
+      "grad_norm": 1.1972421407699585,
+      "learning_rate": 0.0006204486644853617,
+      "loss": 0.2953,
+      "step": 42000
+    },
+    {
+      "epoch": 2.7266311669981396,
+      "grad_norm": 0.9601694345474243,
+      "learning_rate": 0.0006183101302367357,
+      "loss": 0.2901,
+      "step": 42500
+    },
+    {
+      "epoch": 2.7587091807275295,
+      "grad_norm": 0.796318531036377,
+      "learning_rate": 0.0006161715959881098,
+      "loss": 0.2879,
+      "step": 43000
+    },
+    {
+      "epoch": 2.7907871944569194,
+      "grad_norm": 1.1968493461608887,
+      "learning_rate": 0.0006140330617394838,
+      "loss": 0.2917,
+      "step": 43500
+    },
+    {
+      "epoch": 2.800538910630654,
+      "eval_loss": 0.2793387174606323,
+      "eval_runtime": 5.7736,
+      "eval_samples_per_second": 86.601,
+      "eval_steps_per_second": 5.542,
+      "step": 43652
+    },
+    {
+      "epoch": 2.8228652081863093,
+      "grad_norm": 0.9883773326873779,
+      "learning_rate": 0.0006118945274908578,
+      "loss": 0.2864,
+      "step": 44000
+    },
+    {
+      "epoch": 2.8549432219156987,
+      "grad_norm": 0.7262638807296753,
+      "learning_rate": 0.0006097559932422318,
+      "loss": 0.2867,
+      "step": 44500
+    },
+    {
+      "epoch": 2.887021235645089,
+      "grad_norm": 0.9277000427246094,
+      "learning_rate": 0.0006076174589936059,
+      "loss": 0.2901,
+      "step": 45000
+    },
+    {
+      "epoch": 2.9190992493744785,
+      "grad_norm": 0.9092797636985779,
+      "learning_rate": 0.0006054789247449798,
+      "loss": 0.289,
+      "step": 45500
+    },
+    {
+      "epoch": 2.951177263103869,
+      "grad_norm": 1.1064151525497437,
+      "learning_rate": 0.0006033403904963538,
+      "loss": 0.2925,
+      "step": 46000
+    },
+    {
+      "epoch": 2.9832552768332583,
+      "grad_norm": 1.2269039154052734,
+      "learning_rate": 0.0006012018562477279,
+      "loss": 0.284,
+      "step": 46500
+    },
+    {
+      "epoch": 3.000577404247129,
+      "eval_loss": 0.2754272520542145,
+      "eval_runtime": 5.798,
+      "eval_samples_per_second": 86.237,
+      "eval_steps_per_second": 5.519,
+      "step": 46770
+    },
+    {
+      "epoch": 3.015333290562648,
+      "grad_norm": 0.8195134401321411,
+      "learning_rate": 0.0005990633219991018,
+      "loss": 0.2817,
+      "step": 47000
+    },
+    {
+      "epoch": 3.047411304292038,
+      "grad_norm": 0.6603811383247375,
+      "learning_rate": 0.0005969290648189731,
+      "loss": 0.2763,
+      "step": 47500
+    },
+    {
+      "epoch": 3.079489318021428,
+      "grad_norm": 1.3445206880569458,
+      "learning_rate": 0.0005947905305703471,
+      "loss": 0.2766,
+      "step": 48000
+    },
+    {
+      "epoch": 3.111567331750818,
+      "grad_norm": 0.9091941118240356,
+      "learning_rate": 0.0005926519963217211,
+      "loss": 0.276,
+      "step": 48500
+    },
+    {
+      "epoch": 3.143645345480208,
+      "grad_norm": 0.9965337514877319,
+      "learning_rate": 0.0005905134620730951,
+      "loss": 0.2795,
+      "step": 49000
+    },
+    {
+      "epoch": 3.1757233592095977,
+      "grad_norm": 0.9587671160697937,
+      "learning_rate": 0.0005883749278244692,
+      "loss": 0.2752,
+      "step": 49500
+    },
+    {
+      "epoch": 3.2006158978636043,
+      "eval_loss": 0.27157729864120483,
+      "eval_runtime": 5.9339,
+      "eval_samples_per_second": 84.261,
+      "eval_steps_per_second": 5.393,
+      "step": 49888
+    },
+    {
+      "epoch": 3.2078013729389876,
+      "grad_norm": 1.0777298212051392,
+      "learning_rate": 0.0005862363935758431,
+      "loss": 0.2774,
+      "step": 50000
+    },
+    {
+      "epoch": 3.2398793866683775,
+      "grad_norm": 1.258832335472107,
+      "learning_rate": 0.0005840978593272172,
+      "loss": 0.2731,
+      "step": 50500
+    },
+    {
+      "epoch": 3.2719574003977674,
+      "grad_norm": 0.8360182642936707,
+      "learning_rate": 0.0005819593250785912,
+      "loss": 0.2741,
+      "step": 51000
+    },
+    {
+      "epoch": 3.3040354141271573,
+      "grad_norm": 0.8642995357513428,
+      "learning_rate": 0.0005798250678984624,
+      "loss": 0.2764,
+      "step": 51500
+    },
+    {
+      "epoch": 3.336113427856547,
+      "grad_norm": 0.8430376052856445,
+      "learning_rate": 0.0005776865336498364,
+      "loss": 0.276,
+      "step": 52000
+    },
+    {
+      "epoch": 3.368191441585937,
+      "grad_norm": 1.0088149309158325,
+      "learning_rate": 0.0005755479994012104,
+      "loss": 0.2772,
+      "step": 52500
+    },
+    {
+      "epoch": 3.400269455315327,
+      "grad_norm": 1.1767189502716064,
+      "learning_rate": 0.0005734094651525844,
+      "loss": 0.2728,
+      "step": 53000
+    },
+    {
+      "epoch": 3.4006543914800798,
+      "eval_loss": 0.26524877548217773,
+      "eval_runtime": 6.0162,
+      "eval_samples_per_second": 83.109,
+      "eval_steps_per_second": 5.319,
+      "step": 53006
+    },
+    {
+      "epoch": 3.432347469044717,
+      "grad_norm": 0.8557692170143127,
+      "learning_rate": 0.0005712752079724557,
+      "loss": 0.274,
+      "step": 53500
+    },
+    {
+      "epoch": 3.464425482774107,
+      "grad_norm": 0.9875285625457764,
+      "learning_rate": 0.0005691366737238297,
+      "loss": 0.272,
+      "step": 54000
+    },
+    {
+      "epoch": 3.4965034965034967,
+      "grad_norm": 1.0980254411697388,
+      "learning_rate": 0.0005669981394752037,
+      "loss": 0.2793,
+      "step": 54500
+    },
+    {
+      "epoch": 3.528581510232886,
+      "grad_norm": 0.8793803453445435,
+      "learning_rate": 0.0005648596052265778,
+      "loss": 0.2746,
+      "step": 55000
+    },
+    {
+      "epoch": 3.5606595239622765,
+      "grad_norm": 0.9332100749015808,
+      "learning_rate": 0.000562725348046449,
+      "loss": 0.2737,
+      "step": 55500
+    },
+    {
+      "epoch": 3.592737537691666,
+      "grad_norm": 0.8742081522941589,
+      "learning_rate": 0.000560586813797823,
+      "loss": 0.2783,
+      "step": 56000
+    },
+    {
+      "epoch": 3.6006928850965547,
+      "eval_loss": 0.26503631472587585,
+      "eval_runtime": 5.8907,
+      "eval_samples_per_second": 84.88,
+      "eval_steps_per_second": 5.432,
+      "step": 56124
+    },
+    {
+      "epoch": 3.6248155514210563,
+      "grad_norm": 0.6926779747009277,
+      "learning_rate": 0.000558448279549197,
+      "loss": 0.2756,
+      "step": 56500
+    },
+    {
+      "epoch": 3.6568935651504457,
+      "grad_norm": 0.7763874530792236,
+      "learning_rate": 0.0005563097453005711,
+      "loss": 0.2733,
+      "step": 57000
+    },
+    {
+      "epoch": 3.6889715788798356,
+      "grad_norm": 0.7885093092918396,
+      "learning_rate": 0.000554171211051945,
+      "loss": 0.276,
+      "step": 57500
+    },
+    {
+      "epoch": 3.7210495926092255,
+      "grad_norm": 1.1363283395767212,
+      "learning_rate": 0.000552032676803319,
+      "loss": 0.2707,
+      "step": 58000
+    },
+    {
+      "epoch": 3.7531276063386154,
+      "grad_norm": 0.9212961196899414,
+      "learning_rate": 0.0005498941425546931,
+      "loss": 0.2749,
+      "step": 58500
+    },
+    {
+      "epoch": 3.7852056200680053,
+      "grad_norm": 0.9321721196174622,
+      "learning_rate": 0.000547755608306067,
+      "loss": 0.2794,
+      "step": 59000
+    },
+    {
+      "epoch": 3.80073137871303,
+      "eval_loss": 0.2648073732852936,
+      "eval_runtime": 6.0491,
+      "eval_samples_per_second": 82.658,
+      "eval_steps_per_second": 5.29,
+      "step": 59242
+    },
+    {
+      "epoch": 3.817283633797395,
+      "grad_norm": 0.9941183924674988,
+      "learning_rate": 0.0005456213511259384,
+      "loss": 0.2729,
+      "step": 59500
+    },
+    {
+      "epoch": 3.849361647526785,
+      "grad_norm": 0.7528358101844788,
+      "learning_rate": 0.0005434828168773123,
+      "loss": 0.2641,
+      "step": 60000
+    },
+    {
+      "epoch": 3.881439661256175,
+      "grad_norm": 0.9063546061515808,
+      "learning_rate": 0.0005413485596971835,
+      "loss": 0.2727,
+      "step": 60500
+    },
+    {
+      "epoch": 3.913517674985565,
+      "grad_norm": 0.8331403136253357,
+      "learning_rate": 0.0005392100254485576,
+      "loss": 0.271,
+      "step": 61000
+    },
+    {
+      "epoch": 3.945595688714955,
+      "grad_norm": 0.8270218372344971,
+      "learning_rate": 0.0005370714911999316,
+      "loss": 0.2741,
+      "step": 61500
+    },
+    {
+      "epoch": 3.9776737024443447,
+      "grad_norm": 0.897326648235321,
+      "learning_rate": 0.0005349329569513055,
+      "loss": 0.2763,
+      "step": 62000
+    },
+    {
+      "epoch": 4.0007698723295055,
+      "eval_loss": 0.2582992613315582,
+      "eval_runtime": 5.7482,
+      "eval_samples_per_second": 86.984,
+      "eval_steps_per_second": 5.567,
+      "step": 62360
+    },
+    {
+      "epoch": 4.009751716173734,
+      "grad_norm": 0.9636707901954651,
+      "learning_rate": 0.0005327944227026797,
+      "loss": 0.2647,
+      "step": 62500
+    },
+    {
+      "epoch": 4.0418297299031245,
+      "grad_norm": 0.7918885350227356,
+      "learning_rate": 0.0005306558884540536,
+      "loss": 0.2613,
+      "step": 63000
+    },
+    {
+      "epoch": 4.073907743632514,
+      "grad_norm": 0.6849018335342407,
+      "learning_rate": 0.0005285173542054276,
+      "loss": 0.2581,
+      "step": 63500
+    },
+    {
+      "epoch": 4.105985757361904,
+      "grad_norm": 1.1329964399337769,
+      "learning_rate": 0.0005263788199568017,
+      "loss": 0.2607,
+      "step": 64000
+    },
+    {
+      "epoch": 4.138063771091294,
+      "grad_norm": 0.8808174729347229,
+      "learning_rate": 0.0005242445627766729,
+      "loss": 0.2538,
+      "step": 64500
+    },
+    {
+      "epoch": 4.170141784820684,
+      "grad_norm": 1.0024505853652954,
+      "learning_rate": 0.0005221103055965442,
+      "loss": 0.2635,
+      "step": 65000
+    },
+    {
+      "epoch": 4.2008083659459805,
+      "eval_loss": 0.2572609782218933,
+      "eval_runtime": 5.9346,
+      "eval_samples_per_second": 84.252,
+      "eval_steps_per_second": 5.392,
+      "step": 65478
+    },
+    {
+      "epoch": 4.2022197985500735,
+      "grad_norm": 0.763168454170227,
+      "learning_rate": 0.0005199717713479182,
+      "loss": 0.2633,
+      "step": 65500
+    },
+    {
+      "epoch": 4.234297812279464,
+      "grad_norm": 0.9919458031654358,
+      "learning_rate": 0.0005178332370992922,
+      "loss": 0.2619,
+      "step": 66000
+    },
+    {
+      "epoch": 4.266375826008853,
+      "grad_norm": 0.7385163307189941,
+      "learning_rate": 0.0005156947028506661,
+      "loss": 0.2599,
+      "step": 66500
+    },
+    {
+      "epoch": 4.298453839738244,
+      "grad_norm": 0.9631339907646179,
+      "learning_rate": 0.0005135561686020402,
+      "loss": 0.2591,
+      "step": 67000
+    },
+    {
+      "epoch": 4.330531853467633,
+      "grad_norm": 1.0507500171661377,
+      "learning_rate": 0.0005114176343534142,
+      "loss": 0.2578,
+      "step": 67500
+    },
+    {
+      "epoch": 4.3626098671970235,
+      "grad_norm": 0.715898334980011,
+      "learning_rate": 0.0005092791001047882,
+      "loss": 0.2582,
+      "step": 68000
+    },
+    {
+      "epoch": 4.394687880926413,
+      "grad_norm": 1.0435441732406616,
+      "learning_rate": 0.0005071405658561622,
+      "loss": 0.2579,
+      "step": 68500
+    },
+    {
+      "epoch": 4.400846859562456,
+      "eval_loss": 0.25285640358924866,
+      "eval_runtime": 5.9918,
+      "eval_samples_per_second": 83.447,
+      "eval_steps_per_second": 5.341,
+      "step": 68596
+    },
+    {
+      "epoch": 4.426765894655803,
+      "grad_norm": 0.7549169063568115,
+      "learning_rate": 0.0005050020316075363,
+      "loss": 0.2582,
+      "step": 69000
+    },
+    {
+      "epoch": 4.458843908385193,
+      "grad_norm": 1.0151047706604004,
+      "learning_rate": 0.0005028677744274075,
+      "loss": 0.2607,
+      "step": 69500
+    },
+    {
+      "epoch": 4.490921922114583,
+      "grad_norm": 0.8479259014129639,
+      "learning_rate": 0.0005007292401787815,
+      "loss": 0.2618,
+      "step": 70000
+    },
+    {
+      "epoch": 4.5229999358439725,
+      "grad_norm": 0.7322765588760376,
+      "learning_rate": 0.0004985907059301555,
+      "loss": 0.2617,
+      "step": 70500
+    },
+    {
+      "epoch": 4.555077949573363,
+      "grad_norm": 0.7541555762290955,
+      "learning_rate": 0.0004964521716815294,
+      "loss": 0.2588,
+      "step": 71000
+    },
+    {
+      "epoch": 4.587155963302752,
+      "grad_norm": 0.8848826885223389,
+      "learning_rate": 0.0004943179145014008,
+      "loss": 0.2592,
+      "step": 71500
+    },
+    {
+      "epoch": 4.600885353178931,
+      "eval_loss": 0.2534917891025543,
+      "eval_runtime": 5.9901,
+      "eval_samples_per_second": 83.471,
+      "eval_steps_per_second": 5.342,
+      "step": 71714
+    },
+    {
+      "epoch": 4.619233977032142,
+      "grad_norm": 2.1285810470581055,
+      "learning_rate": 0.0004921793802527747,
+      "loss": 0.2558,
+      "step": 72000
+    },
+    {
+      "epoch": 4.651311990761532,
+      "grad_norm": 1.0571913719177246,
+      "learning_rate": 0.0004900408460041488,
+      "loss": 0.2604,
+      "step": 72500
+    },
+    {
+      "epoch": 4.683390004490922,
+      "grad_norm": 0.9374910593032837,
+      "learning_rate": 0.00048790231175552284,
+      "loss": 0.2548,
+      "step": 73000
+    },
+    {
+      "epoch": 4.715468018220312,
+      "grad_norm": 1.045569658279419,
+      "learning_rate": 0.0004857637775068968,
+      "loss": 0.2543,
+      "step": 73500
+    },
+    {
+      "epoch": 4.747546031949701,
+      "grad_norm": 0.8247463703155518,
+      "learning_rate": 0.00048363379739526535,
+      "loss": 0.2554,
+      "step": 74000
+    },
+    {
+      "epoch": 4.779624045679092,
+      "grad_norm": 0.7106033563613892,
+      "learning_rate": 0.0004814952631466393,
+      "loss": 0.2566,
+      "step": 74500
+    },
+    {
+      "epoch": 4.800923846795406,
+      "eval_loss": 0.2493596225976944,
+      "eval_runtime": 5.8672,
+      "eval_samples_per_second": 85.22,
+      "eval_steps_per_second": 5.454,
+      "step": 74832
+    },
+    {
+      "epoch": 4.811702059408481,
+      "grad_norm": 0.8932815194129944,
+      "learning_rate": 0.0004793567288980133,
+      "loss": 0.2519,
+      "step": 75000
+    },
+    {
+      "epoch": 4.8437800731378715,
+      "grad_norm": 0.8518146872520447,
+      "learning_rate": 0.00047721819464938733,
+      "loss": 0.2541,
+      "step": 75500
+    },
+    {
+      "epoch": 4.875858086867261,
+      "grad_norm": 1.0584919452667236,
+      "learning_rate": 0.0004750796604007613,
+      "loss": 0.2503,
+      "step": 76000
+    },
+    {
+      "epoch": 4.907936100596651,
+      "grad_norm": 0.8002163171768188,
+      "learning_rate": 0.0004729411261521353,
+      "loss": 0.2587,
+      "step": 76500
+    },
+    {
+      "epoch": 4.940014114326041,
+      "grad_norm": 0.7338119745254517,
+      "learning_rate": 0.00047080259190350937,
+      "loss": 0.2539,
+      "step": 77000
+    },
+    {
+      "epoch": 4.972092128055431,
+      "grad_norm": 0.8065565228462219,
+      "learning_rate": 0.00046866833472338057,
+      "loss": 0.2536,
+      "step": 77500
+    }
+  ],
+  "logging_steps": 500,
+  "max_steps": 187044,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 12,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 5.238186599325368e+18,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-77935/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:37b90a284264b08902c1644c9f43994559f5b7a14e3b12bb5ba7570f7f5cdcae
+size 5496