---
license: apache-2.0
datasets:
- eriktks/conll2003
language:
- en
base_model:
- stefan-it/ettin-encoder-400m-tokenizer-fix
tags:
- ner
---

# ✨ Ettin 400M for NER

This repository hosts an Ettin 400M model that was fine-tuned on the CoNLL-2003 NER dataset with the awesome Flair libary.

Please notice the following caveats:

* ⚠️ To workaround a tokenizer problem in ModernBERT/Ettin, this model was fine-tuned on a [forked and modified](https://huggingface.co/stefan-it/ettin-encoder-400m-tokenizer-fix) Ettin 400M model.
* ⚠️ At the moment, don't expect "uber" BERT-like performance, more experiments are needed. I am pretty sure that RoPE is causing this.

## 📝 Implementation

The model was trained using my [ModernBERT experiments](https://github.com/stefan-it/modern-bert-ner) repo.

## 📊 Performance

A very basic hyper-parameter search is performanced for five different seeds, with reported averaged micro F1-Score on the development set of CoNLL-2003:

| Configuration          |   Run 1 |   Run 2 |   Run 3   |   Run 4 |   Run 5 | Avg.         |
|------------------------|---------|---------|-----------|---------|---------|--------------|
| `bs16-e10-cs0-lr4e-05` |   96    |   96.17 | **96.31** |   96.19 |   96.2  | 96.17 ± 0.1  |
| `bs16-e10-cs0-lr3e-05` |   96.25 |   96.23 |   96.12   |   96.3  |   95.81 | 96.14 ± 0.18 |
| `bs16-e10-cs0-lr2e-05` |   96.09 |   96.24 |   95.88   |   96.1  |   96.12 | 96.09 ± 0.12 |
| `bs16-e10-cs0-lr5e-05` |   95.98 |   95.93 |   96.11   |   96.1  |   96    | 96.02 ± 0.07 |
| `bs16-e10-cs0-lr1e-05` |   95.77 |   95.8  |   96.14   |   96.01 |   95.84 | 95.91 ± 0.14 |

The performance of the current uploaded model is marked in bold.

## 📣 Usage

The following code can be used to test the model and recognize named entities for a given sentence:

```python
from flair.data import Sentence
from flair.models import SequenceTagger

# Load the model
tagger = SequenceTagger.load("stefan-it/flair-ettin-400m-ner-conll03")

# Define an example sentence
sentence = Sentence("George Washington went to Washington very fast.")

# Now let's predict named entities...
tagger.predict(sentence)

# Print-out the recognized named entities
print("The following named entities are found:")
for entity in sentence.get_spans('ner'):
    print(entity)
```

This outputs:

```text
Span[0:2]: "George Washington" → PER (1.0000)
Span[4:5]: "Washington" → LOC (1.0000)
```