Hi,
I’m yusukemori.
While I check the model explanations in the pretrained_models list (https://huggingface.co/transformers/pretrained_models.html),
I found that there seemed to be a mistake regarding BART.
Regarding facebook/bart-large-cnn, the explanation is as follows:
12-layer, 1024-hidden, 16-heads, 406M parameters (same as base)
bart-large base architecture finetuned on cnn summarization task
If my understanding is correct, are not 12-layer and (same as base) should be 24-layer and (same as large)?
I’m sorry if my understanding is wrong, or if someone has already noticed and fixed it.
Thank you in advance.
yusukemori
Transformers. I’m worried, but I’m excited!)