NagaNLP NER (XLM-RoBERTa)

NagaNLP-NER is a Named Entity Recognition model fine-tuned on the Nagamese (Naga Pidgin) language. It is based on XLM-RoBERTa and trained to identify entities such as Persons, Locations, Organizations, and Miscellaneous entities.

This model is part of the NagaNLP project, aiming to provide foundational NLP resources for the low-resource languages of Nagaland.

Model Details

Training Data

The model was fine-tuned on a manually annotated corpus containing 214 sentences (approx. 4,800 tokens).

  • Source: NagaNLP Conversational Corpus subset.
  • Tags: CoNLL-2003 format (PER, LOC, ORG, MISC).

Intended Use

This model is intended for:

  • Extracting entities from Nagamese text.
  • Benchmarking multilingual models (like XLM-R) on extremely low-resource creole languages.

How to Get Started

YouCan use this model with the Hugging Face pipeline:

from transformers import pipeline

# Load the pipeline
ner_pipeline = pipeline("ner", model="agnivamaiti/naganlp-ner", aggregation_strategy="simple")

# Inference
text = "Etu retreating monsoon normally October mahina start hoi."
results = ner_pipeline(text)

# Print results
for entity in results:
    print(entity)
# Expected Output: {'entity_group': 'MISC', 'word': 'monsoon', ...}, {'entity_group': 'MISC', 'word': 'October', ...}

Limitations

  • Data Scarcity: Trained on a very small dataset (214 sentences). It serves as a baseline proof-of-concept and may struggle with vocabulary not seen during training.
  • Generalization: May perform poorly on dialects significantly different from the training corpus (Kohima/Dimapur standard).

Citation

If you use this model, please cite the associated NagaNLP research paper: Citation details to be added upon publication.

Downloads last month
3
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for agnivamaiti/naganlp-ner

Finetuned
(3712)
this model

Dataset used to train agnivamaiti/naganlp-ner

Space using agnivamaiti/naganlp-ner 1

Collection including agnivamaiti/naganlp-ner