--- language: - pl license: mit tags: - token-classification - named-entity-recognition - polish - anonymization - privacy - gdpr datasets: - custom metrics: - f1 - precision - recall base_model: allegro/herbert-base-cased model-index: - name: AnonBERT-ENR results: - task: type: token-classification name: Named Entity Recognition metrics: - type: f1 value: 0.84 name: F1 Score widget: - text: "Nazywam się Jan Kowalski i mieszkam w Warszawie przy ulicy Marszałkowskiej 1. Mój email to jan.kowalski@example.com, a numer telefonu +48 123 456 789." example_title: "Personal Data Example" - text: "PESEL: 12345678901, dowód osobisty: ABC123456. Data urodzenia: 1990-05-20." example_title: "ID Numbers Example" --- # AnonBERT-ENR: Polish Personal Data Anonymization Model **Fine-tuned HerBERT model for Named Entity Recognition and anonymization of sensitive personal information in Polish text.** > Achieves **84% F1 score** on test set for identifying and anonymizing 25+ types of personal data entities. ## Model Description AnonBERT-ENR is a specialized NER model fine-tuned on [allegro/herbert-base-cased](https://huggingface.co/allegro/herbert-base-cased) for detecting and anonymizing personal data in Polish documents. The model is designed to help organizations comply with GDPR and other privacy regulations by automatically identifying sensitive information. ### Key Features - ✅ **High Accuracy**: 84% F1 score on diverse test data - 🇵🇱 **Polish Language**: Optimized for Polish text and naming conventions - 🔒 **Privacy-Focused**: Detects 25+ types of sensitive personal information - 🚀 **Production Ready**: Includes complete anonymization pipeline - 📊 **Comprehensive Coverage**: Names, IDs, contact info, health data, political views, and more ## Supported Entity Types The model can identify the following types of personal information: | Entity Type | Description | Example | |------------|-------------|---------| | `NAME` | First name | Jan | | `SURNAME` | Last name | Kowalski | | `EMAIL` | Email address | jan@example.pl | | `PHONE` | Phone number | +48 123 456 789 | | `PESEL` | National ID number | 12345678901 | | `DOCUMENT` | ID document number | ABC123456 | | `ADDRESS` | Street address | ul. Marszałkowska 1 | | `CITY` | City name | Warszawa | | `BANK_ACCOUNT` | Bank account number | PL61109010140000071219812874 | | `CREDIT_CARD` | Credit card number | 1234-5678-9012-3456 | | `DATE_BIRTH` | Date of birth | 1990-05-20 | | `DATE` | General date | 2024-01-15 | | `AGE` | Age | 25 lat | | `SEX` | Gender/sex | Mężczyzna, Kobieta | | `COMPANY` | Company name | Allegro | | `SCHOOL` | School name | Uniwersytet Warszawski | | `JOB` | Job title | Dyrektor | | `USERNAME` | Username/login | jan.kowalski | | `HEALTH` | Health information | cukrzyca | | `RELIGION` | Religious affiliation | katolik | | `POLITICAL` | Political views | liberalny | | `ETHNICITY` | Ethnicity | polska | | `ORIENTATION` | Sexual orientation | heteroseksualny | | `RELATIVE` | Family relation | matka | | `SECRET` | Secret/confidential info | hasło: abc123 |