Is it possible to remove all other language from NLLB200 except English and German?

anasalzuvix · January 8, 2023, 8:09am

Greetings Everyone,

I am starting to learn Deep Learning (especially Machine Translation). Recently I found that Facebook released pre-trained models like M2M100 and NLLB200. In HuggingFace

But I have a few questions about these models; as you all know, NLLB200 can translate more than 200x200 = 40,000 directions because they’re designed for multilingual purposes. That’s why the size of these pre-trained models is vast, but my question arrived here.

“Is it possible to delete or split this pre-trained model into only two languages?”

What I am saying is Those models will delete or split all other languages and directions, Except English and German, so it will only translate English – German and German – English.

(I mean I need only 2 Direction, not 40,000 directions)

By doing this, the model will shrink to a smaller size, which is what I need.

Your expert advice and support will be invaluable to me, and I eagerly await your reply.

addressoic · June 13, 2023, 6:30pm

I’m also interested in doing this with about 16 languages. 256 directions. I would image it would take tracing out what embeddings these language keys are using and extracting only those out. I would have to look look at the model graph and try to determine if thats done in specific layers or what parts could be prunned and what would have to remain to make the model smaller.

hunterschep · December 6, 2025, 6:20am

Hey I may have replied to another one of your posts but just wanted to help you out here:

You can’t “remove” the other languages from NLLB or shrink it to only EN↔DE. The model isn’t modular: all languages share the same encoder/decoder layers. The language codes are just tokens, and deleting them usually breaks the tokenizer/model alignment.

NLLB doesn’t store language-specific blocks you can prune out, so removing 198 languages won’t meaningfully reduce model size.

What to do instead:
• Just fine-tune the full model on your EN↔DE data
• Or distill a smaller student model from it
• Or quantize (INT8/INT4)
• Or use LoRA/QLoRA to make fine-tuning cheaper

If you want a working 2025-compatible fine-tuning setup, I wrote an updated tutorial here (you can ignore the tokenizer section if you’re not adding new languages)

You can see it on my profile since HF wont let me post the link…

Topic		Replies	Views
Marian Deprecation Warning 🤗Transformers	0	245	August 18, 2020
How can I train M2M-100 or NLLB-200 on my parallel bilingual corpus? 🤗Transformers	0	800	September 22, 2022
Small miniLM model for multilingual 🤗Transformers	0	333	October 7, 2021
NER on multiple languages 🤗Transformers	1	2884	August 6, 2020
Saving underlying language model after trained on downstream task 🤗Transformers	0	434	September 14, 2020

Is it possible to remove all other language from NLLB200 except English and German?

Related topics