NB-ROBERTA Training Code
This is the current training code for the planned nb-roberta models.
We are currently planning to run the following experiments:
| Name
|
nb-roberta-base-old (C)
|
| Corpus
|
NbAiLab/nb_bert
|
| Pod size
|
v4-64
|
| Batch size
|
62*4*8 = 1984 = 2k
|
| Learning rate
|
3e-4 (RoBERTa article is using 6e-4 and bs=8k)
|
| Number of steps
|
250k
|
| Name
|
nb-roberta-base-ext (B)
|
| Corpus
|
NbAiLab/nbailab_extended
|
| Pod size
|
v4-64
|
| Batch size
|
62*4*8 = 1984 = 2k
|
| Learning rate
|
3e-4 (RoBERTa article is using 6e-4 and bs=8k)
|
| Number of steps
|
250k
|
| Name
|
nb-roberta-large-ext
|
| Corpus
|
NbAiLab/nbailab_extended
|
| Pod size
|
v4-64
|
| Batch size
|
32*4*8 = 2024 = 1k
|
| Learning rate
|
2-e4 (RoBERTa article is using 4e-4 and bs=8k)
|
| Number of steps
|
500k
|
| Name
|
nb-roberta-base-scandi
|
| Corpus
|
NbAiLab/scandinavian
|
| Pod size
|
v4-64
|
| Batch size
|
62*4*8 = 1984 = 2k
|
| Learning rate
|
3e-4 (RoBERTa article is using 6e-4 and bs=8k)
|
| Number of steps
|
250k
|
| Name
|
nb-roberta-large-scandi
|
| Corpus
|
NbAiLab/scandinavian
|
| Pod size
|
v4-64
|
| Batch size
|
32*4*8 = 1024 = 1k
|
| Learning rate
|
2-e4 (RoBERTa article is using 4e-4 and bs=8k)
|
| Number of steps
|
500k
|
Calculations
Some basic that we used when estimating the number of training steps:
- The Scandinavic Corpus is 85GB
- The Scandinavic Corpus contains 13B words
- With a conversion factor of 2.3, this is estimated to around 30B tokens
- 30B tokens / (512 seq length * 3000 batch size) = 20.000 steps