AnonBERT-ENR: Polish Personal Data Anonymization Model

Fine-tuned HerBERT model for Named Entity Recognition and anonymization of sensitive personal information in Polish text.

Achieves 84% F1 score on test set for identifying and anonymizing 25+ types of personal data entities.

Model Description

AnonBERT-ENR is a specialized NER model fine-tuned on allegro/herbert-base-cased for detecting and anonymizing personal data in Polish documents. The model is designed to help organizations comply with GDPR and other privacy regulations by automatically identifying sensitive information.

Key Features

  • High Accuracy: 84% F1 score on diverse test data
  • 🇵🇱 Polish Language: Optimized for Polish text and naming conventions
  • 🔒 Privacy-Focused: Detects 25+ types of sensitive personal information
  • 🚀 Production Ready: Includes complete anonymization pipeline
  • 📊 Comprehensive Coverage: Names, IDs, contact info, health data, political views, and more

Supported Entity Types

The model can identify the following types of personal information:

Entity Type Description Example
NAME First name Jan
SURNAME Last name Kowalski
EMAIL Email address [email protected]
PHONE Phone number +48 123 456 789
PESEL National ID number 12345678901
DOCUMENT ID document number ABC123456
ADDRESS Street address ul. Marszałkowska 1
CITY City name Warszawa
BANK_ACCOUNT Bank account number PL61109010140000071219812874
CREDIT_CARD Credit card number 1234-5678-9012-3456
DATE_BIRTH Date of birth 1990-05-20
DATE General date 2024-01-15
AGE Age 25 lat
SEX Gender/sex Mężczyzna, Kobieta
COMPANY Company name Allegro
SCHOOL School name Uniwersytet Warszawski
JOB Job title Dyrektor
USERNAME Username/login jan.kowalski
HEALTH Health information cukrzyca
RELIGION Religious affiliation katolik
POLITICAL Political views liberalny
ETHNICITY Ethnicity polska
ORIENTATION Sexual orientation heteroseksualny
RELATIVE Family relation matka
SECRET Secret/confidential info hasło: abc123
Downloads last month
79
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Matela7/AnonBert-ENR

Finetuned
(7)
this model

Evaluation results