There seems to be a mistake in documentation (pretrained_models.html) regarding BART

yusukemori · October 26, 2020, 3:16am

Hi,

I’m yusukemori.

While I check the model explanations in the pretrained_models list (https://huggingface.co/transformers/pretrained_models.html),
I found that there seemed to be a mistake regarding BART.

Regarding facebook/bart-large-cnn, the explanation is as follows:

12-layer, 1024-hidden, 16-heads, 406M parameters (same as base)
bart-large base architecture finetuned on cnn summarization task

If my understanding is correct, are not 12-layer and (same as base) should be 24-layer and (same as large)?

I’m sorry if my understanding is wrong, or if someone has already noticed and fixed it.

Thank you in advance.

yusukemori

sgugger · October 26, 2020, 1:17pm

It does seem wrong. Don’t hesitate to suggest a PR to fix this!

yusukemori · October 26, 2020, 1:37pm

@sgugger
Thank you for checking my post and giving me advice!
I will suggest the PR as soon as possible!
(This will be my first PR to Transformers. I’m worried, but I’m excited!)

Topic		Replies	Views
[Bart] Bart model families' embedding shape? Beginners	0	239	May 13, 2023
What's the difference between bart-base tokenizer and bart-large tokenizer Beginners	6	2098	December 6, 2020
The result of bart-large is more stranger compare to the bart-base 🤗Transformers	0	623	July 5, 2022
Question regarding training of BartForConditionalGeneration Models	1	2046	March 2, 2021
Getting ValueError: Dimensions must be equal, but are 128 and 1024 when finetuning BART for summarization Beginners	0	877	August 3, 2021

There seems to be a mistake in documentation (pretrained_models.html) regarding BART

Related topics