VABERT-EyeBERT

A Named Entity Recognition (NER) model specifically designed for Visual Acuity (VA) extraction from clinical ophthalmology text. This model is fine-tuned on EyeBERT, a domain-specific BERT model for ophthalmology.

Model Description

VABERT-EyeBERT identifies and extracts visual acuity measurements from clinical text, enabling automated processing of ophthalmological records and reports.

  • Base Model: qnguy3n/eyebert-base
  • Task: Token Classification (Named Entity Recognition)
  • Domain: Medical - Ophthalmology
  • Language: English

Requirements

transformers >= 4.25
huggingface_hub >= 0.14
torch

Installation

Download the base model:

huggingface-cli download qnguy3n/eyebert-base --local-dir models/eyebert-base

Verify your environment:

import transformers
import huggingface_hub

print(f"transformers version: {transformers.__version__}")
print(f"huggingface_hub version: {huggingface_hub.__version__}")

Usage

Basic Inference

import torch
from transformers import AutoModelForTokenClassification, AutoTokenizer

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("qnguy3n/eyebert-base")
model = AutoModelForTokenClassification.from_pretrained("qnguy3n/vabert-eyebert")

# Check label mappings
id2label = model.config.id2label
label2id = model.config.label2id

def predict_ner(text, model, tokenizer, verbose=False):
    """
    Predict NER spans for input text using BIO tagging
    """
    encoding = tokenizer(
        text,
        return_tensors="pt",
        truncation=True,
        max_length=512,
        return_offsets_mapping=True
    )

    offset_mapping = encoding.pop("offset_mapping")  # ✅ remove before model call
    encoding = encoding.to(model.device)

    model.eval()
    with torch.no_grad():
        outputs = model(**encoding)

    predictions = outputs.logits.argmax(dim=-1)[0]
    tokens = tokenizer.convert_ids_to_tokens(encoding["input_ids"][0])

    labels = []
    for token, pred in zip(tokens, predictions):
        if token in ["[CLS]", "[SEP]", "[PAD]"]:
            continue
        labels.append(model.config.id2label[pred.item()])

    # rebuild a BatchEncoding-like object for the util
    encoding["offset_mapping"] = offset_mapping

    spans = get_ents_from_bio(
        tokens=encoding,
        labels=labels,
        sent=text,
        verbose=verbose
    )

    return spans

# Example usage
text = "Visual acuity: Right Eye: 6/5 Unaided Left Eye: 6/6 Unaided"
spans = predict_ner(text, model, tokenizer)

Intended Use

This model is designed for:

  • Extracting visual acuity measurements from clinical notes
  • Processing ophthalmological medical records
  • Research in clinical NLP for ophthalmology
  • Automated structuring of unstructured clinical text

Limitations

  • Trained specifically for ophthalmological text and may not generalize to other medical domains
  • Performance depends on text format and notation conventions
  • Requires domain-specific tokenizer (EyeBERT) for optimal results

Citation

@misc{vabert_eyebert,
  title={VABERT-EyeBERT: Visual Acuity NER for Ophthalmology},
  author={Nguyen, Quang},
  year={2024},
  publisher={Hugging Face}
}

License

Apache 2.0

Contact

For questions or issues, please open an issue on the model repository.

Downloads last month
25
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for qnguy3n/vabert-eyebert

Finetuned
(1)
this model