NLP Indonesia Multitask
Collection
A collection of Indonesian NLP models for various text classification tasks such as spam detection, hate speech, abusive language, and more. Suitable
•
8 items
•
Updated
Multilingual Indonesian & English — XLM-RoBERTa
This repository provides a fine-tuned XLM-RoBERTa model for MULTILABEL HATE CONTENT DETECTION in social media text.
The model is designed to identify Hate Speech and Abusive Language simultaneously across Indonesian, regional Indonesian languages, and English, particularly in noisy and informal online conversations.
pipelinePerformance metrics are reported on a held-out validation set.
| Metric | Score |
|---|---|
| Precision | 0.9249 |
| Recal | 0.9300 |
| F1 (Macro) | 0.9274 |
| F1 (Weighted) | 0.9269 |
| Training Loss | 0.1181 |
| Validation Loss | 0.2070 |
(Exact scores may vary depending on evaluation split and threshold.)
pip install transformers torch
from transformers import pipeline
classifier = pipeline(
task="text-classification",
model="nahiar/hatespeech-abusive-xlm-roberta-v1",
return_all_scores=True
)
result = classifier("Dasar bodoh, otak udang!")
print(result)
Output
[
{'label': 'HATESPEECH', 'score': 0.9123},
{'label': 'ABUSIVE', 'score': 0.9841}
]
Because this is a multilabel model, more than one label can be active for a single input.
HATESPEECH → Content that attacks or demeans a group based on identity
ABUSIVE → Insulting, offensive, or aggressive language without protected targets
texts = [
"Dasar kaum ini selalu bikin rusuh",
"Kamu memang bodoh dan tidak berguna",
"Saya tidak setuju dengan pendapat kamu"
]
results = classifier(texts)
for text, preds in zip(texts, results):
labels = [(p["label"], round(p["score"], 4)) for p in preds]
print(text, "→", labels)
| Parameter | Value |
|---|---|
| Base Model | xlm-roberta-base |
| Task Type | Multilabel Classification |
| Training Strategy | Fine-tuning |
| Epochs | Multiple |
| Learning Rate | 2e-5 |
| Batch Size | 16 |
| Training Date | 2025-12-18 |
This model is released under the Apache License 2.0 Free for research and commercial use.
@misc{djunaedi2025hatespeech_multilabel,
author = {Raihan Hidayatulloh Djunaedi},
title = {Multilabel Hate Speech and Abusive Language Detection for Social Media Text},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/nahiar/hatespeech-xlmr-v4}
}
Base model
FacebookAI/xlm-roberta-base