VABERT-EyeBERT
A Named Entity Recognition (NER) model specifically designed for Visual Acuity (VA) extraction from clinical ophthalmology text. This model is fine-tuned on EyeBERT, a domain-specific BERT model for ophthalmology.
Model Description
VABERT-EyeBERT identifies and extracts visual acuity measurements from clinical text, enabling automated processing of ophthalmological records and reports.
- Base Model: qnguy3n/eyebert-base
- Task: Token Classification (Named Entity Recognition)
- Domain: Medical - Ophthalmology
- Language: English
Requirements
transformers >= 4.25
huggingface_hub >= 0.14
torch
Installation
Download the base model:
huggingface-cli download qnguy3n/eyebert-base --local-dir models/eyebert-base
Verify your environment:
import transformers
import huggingface_hub
print(f"transformers version: {transformers.__version__}")
print(f"huggingface_hub version: {huggingface_hub.__version__}")
Usage
Basic Inference
import torch
from transformers import AutoModelForTokenClassification, AutoTokenizer
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("qnguy3n/eyebert-base")
model = AutoModelForTokenClassification.from_pretrained("qnguy3n/vabert-eyebert")
# Check label mappings
id2label = model.config.id2label
label2id = model.config.label2id
def predict_ner(text, model, tokenizer, verbose=False):
"""
Predict NER spans for input text using BIO tagging
"""
encoding = tokenizer(
text,
return_tensors="pt",
truncation=True,
max_length=512,
return_offsets_mapping=True
)
offset_mapping = encoding.pop("offset_mapping") # ✅ remove before model call
encoding = encoding.to(model.device)
model.eval()
with torch.no_grad():
outputs = model(**encoding)
predictions = outputs.logits.argmax(dim=-1)[0]
tokens = tokenizer.convert_ids_to_tokens(encoding["input_ids"][0])
labels = []
for token, pred in zip(tokens, predictions):
if token in ["[CLS]", "[SEP]", "[PAD]"]:
continue
labels.append(model.config.id2label[pred.item()])
# rebuild a BatchEncoding-like object for the util
encoding["offset_mapping"] = offset_mapping
spans = get_ents_from_bio(
tokens=encoding,
labels=labels,
sent=text,
verbose=verbose
)
return spans
# Example usage
text = "Visual acuity: Right Eye: 6/5 Unaided Left Eye: 6/6 Unaided"
spans = predict_ner(text, model, tokenizer)
Intended Use
This model is designed for:
- Extracting visual acuity measurements from clinical notes
- Processing ophthalmological medical records
- Research in clinical NLP for ophthalmology
- Automated structuring of unstructured clinical text
Limitations
- Trained specifically for ophthalmological text and may not generalize to other medical domains
- Performance depends on text format and notation conventions
- Requires domain-specific tokenizer (EyeBERT) for optimal results
Citation
@misc{vabert_eyebert,
title={VABERT-EyeBERT: Visual Acuity NER for Ophthalmology},
author={Nguyen, Quang},
year={2024},
publisher={Hugging Face}
}
License
Apache 2.0
Contact
For questions or issues, please open an issue on the model repository.
- Downloads last month
- 25
Model tree for qnguy3n/vabert-eyebert
Base model
qnguy3n/eyebert-base