VABERT-EyeBERT

A Named Entity Recognition (NER) model specifically designed for Visual Acuity (VA) extraction from clinical ophthalmology text. This model is fine-tuned on EyeBERT, a domain-specific BERT model for ophthalmology.

Model Description

VABERT-EyeBERT identifies and extracts visual acuity measurements from clinical text, enabling automated processing of ophthalmological records and reports.

Base Model: qnguy3n/eyebert-base
Task: Token Classification (Named Entity Recognition)
Domain: Medical - Ophthalmology
Language: English

Requirements

transformers >= 4.25
huggingface_hub >= 0.14
torch

Installation

Download the base model:

huggingface-cli download qnguy3n/eyebert-base --local-dir models/eyebert-base

Verify your environment:

import transformers
import huggingface_hub

print(f"transformers version: {transformers.__version__}")
print(f"huggingface_hub version: {huggingface_hub.__version__}")

Usage

Basic Inference

import torch
from transformers import AutoModelForTokenClassification, AutoTokenizer

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("qnguy3n/eyebert-base")
model = AutoModelForTokenClassification.from_pretrained("qnguy3n/vabert-eyebert")

# Check label mappings
id2label = model.config.id2label
label2id = model.config.label2id

def predict_ner(text, model, tokenizer, verbose=False):
    """
    Predict NER spans for input text using BIO tagging
    """
    encoding = tokenizer(
        text,
        return_tensors="pt",
        truncation=True,
        max_length=512,
        return_offsets_mapping=True
    )

    offset_mapping = encoding.pop("offset_mapping")  # ✅ remove before model call
    encoding = encoding.to(model.device)

    model.eval()
    with torch.no_grad():
        outputs = model(**encoding)

    predictions = outputs.logits.argmax(dim=-1)[0]
    tokens = tokenizer.convert_ids_to_tokens(encoding["input_ids"][0])

    labels = []
    for token, pred in zip(tokens, predictions):
        if token in ["[CLS]", "[SEP]", "[PAD]"]:
            continue
        labels.append(model.config.id2label[pred.item()])

    # rebuild a BatchEncoding-like object for the util
    encoding["offset_mapping"] = offset_mapping

    spans = get_ents_from_bio(
        tokens=encoding,
        labels=labels,
        sent=text,
        verbose=verbose
    )

    return spans

# Example usage
text = "Visual acuity: Right Eye: 6/5 Unaided Left Eye: 6/6 Unaided"
spans = predict_ner(text, model, tokenizer)

Intended Use

This model is designed for:

Extracting visual acuity measurements from clinical notes
Processing ophthalmological medical records
Research in clinical NLP for ophthalmology
Automated structuring of unstructured clinical text

Limitations

Trained specifically for ophthalmological text and may not generalize to other medical domains
Performance depends on text format and notation conventions
Requires domain-specific tokenizer (EyeBERT) for optimal results

Citation

@misc{vabert_eyebert,
  title={VABERT-EyeBERT: Visual Acuity NER for Ophthalmology},
  author={Nguyen, Quang},
  year={2024},
  publisher={Hugging Face}
}

License

Apache 2.0

Contact

For questions or issues, please open an issue on the model repository.

Downloads last month: 25

Model tree for qnguy3n/vabert-eyebert

Base model

qnguy3n/eyebert-base

Finetuned

(1)

this model