NbAiLab
/

nb-roberta-tpu

Model card Files Files and versions

NB-ROBERTA Training Code

This is the current training code for the planned nb-roberta models.

We are currently planning to run the following experiments:

Name	nb-roberta-base-old (C)
Corpus	NbAiLab/nb_bert
Pod size	v4-64
Batch size	6248 = 1984 = 2k
Learning rate	3e-4 (RoBERTa article is using 6e-4 and bs=8k)
Number of steps	250k

Name	nb-roberta-base-ext (B)
Corpus	NbAiLab/nbailab_extended
Pod size	v4-64
Batch size	6248 = 1984 = 2k
Learning rate	3e-4 (RoBERTa article is using 6e-4 and bs=8k)
Number of steps	250k

Name	nb-roberta-large-ext
Corpus	NbAiLab/nbailab_extended
Pod size	v4-64
Batch size	3248 = 2024 = 1k
Learning rate	2-e4 (RoBERTa article is using 4e-4 and bs=8k)
Number of steps	500k

Name	nb-roberta-base-scandi
Corpus	NbAiLab/scandinavian
Pod size	v4-64
Batch size	6248 = 1984 = 2k
Learning rate	3e-4 (RoBERTa article is using 6e-4 and bs=8k)
Number of steps	250k

Name	nb-roberta-large-scandi
Corpus	NbAiLab/scandinavian
Pod size	v4-64
Batch size	3248 = 1024 = 1k
Learning rate	2-e4 (RoBERTa article is using 4e-4 and bs=8k)
Number of steps	500k

Calculations

Some basic that we used when estimating the number of training steps:

The Scandinavic Corpus is 85GB
The Scandinavic Corpus contains 13B words
With a conversion factor of 2.3, this is estimated to around 30B tokens
30B tokens / (512 seq length * 3000 batch size) = 20.000 steps

Downloads last month: 11