AnonBert-ENR / README.md
Matela7's picture
Create README.md
084d0d9 verified
metadata
language:
  - pl
license: mit
tags:
  - token-classification
  - named-entity-recognition
  - polish
  - anonymization
  - privacy
  - gdpr
datasets:
  - custom
metrics:
  - f1
  - precision
  - recall
base_model: allegro/herbert-base-cased
model-index:
  - name: AnonBERT-ENR
    results:
      - task:
          type: token-classification
          name: Named Entity Recognition
        metrics:
          - type: f1
            value: 0.84
            name: F1 Score
widget:
  - text: >-
      Nazywam się Jan Kowalski i mieszkam w Warszawie przy ulicy Marszałkowskiej
      1. Mój email to [email protected], a numer telefonu +48 123 456
      789.
    example_title: Personal Data Example
  - text: 'PESEL: 12345678901, dowód osobisty: ABC123456. Data urodzenia: 1990-05-20.'
    example_title: ID Numbers Example

AnonBERT-ENR: Polish Personal Data Anonymization Model

Fine-tuned HerBERT model for Named Entity Recognition and anonymization of sensitive personal information in Polish text.

Achieves 84% F1 score on test set for identifying and anonymizing 25+ types of personal data entities.

Model Description

AnonBERT-ENR is a specialized NER model fine-tuned on allegro/herbert-base-cased for detecting and anonymizing personal data in Polish documents. The model is designed to help organizations comply with GDPR and other privacy regulations by automatically identifying sensitive information.

Key Features

  • High Accuracy: 84% F1 score on diverse test data
  • 🇵🇱 Polish Language: Optimized for Polish text and naming conventions
  • 🔒 Privacy-Focused: Detects 25+ types of sensitive personal information
  • 🚀 Production Ready: Includes complete anonymization pipeline
  • 📊 Comprehensive Coverage: Names, IDs, contact info, health data, political views, and more

Supported Entity Types

The model can identify the following types of personal information:

Entity Type Description Example
NAME First name Jan
SURNAME Last name Kowalski
EMAIL Email address [email protected]
PHONE Phone number +48 123 456 789
PESEL National ID number 12345678901
DOCUMENT ID document number ABC123456
ADDRESS Street address ul. Marszałkowska 1
CITY City name Warszawa
BANK_ACCOUNT Bank account number PL61109010140000071219812874
CREDIT_CARD Credit card number 1234-5678-9012-3456
DATE_BIRTH Date of birth 1990-05-20
DATE General date 2024-01-15
AGE Age 25 lat
SEX Gender/sex Mężczyzna, Kobieta
COMPANY Company name Allegro
SCHOOL School name Uniwersytet Warszawski
JOB Job title Dyrektor
USERNAME Username/login jan.kowalski
HEALTH Health information cukrzyca
RELIGION Religious affiliation katolik
POLITICAL Political views liberalny
ETHNICITY Ethnicity polska
ORIENTATION Sexual orientation heteroseksualny
RELATIVE Family relation matka
SECRET Secret/confidential info hasło: abc123