Stackoverflow Tag Recommender
Model Description
This model is a fine-tuned version of distilbert-base-uncased for multi-label text classification on Stack Overflow Posts. It predicts relevant tags for programming related posts based on the title and content.
Model Details
- Model Type: Multi-label Text Classification
- Base Model: distilbert-base-uncased
- Language: English
- Number of Labels: 20
- Framework: PyTorch + Transformers
- License: Apache 2.0
Performance
| Metric | Value |
|---|---|
| Micro F1 | 0.583 |
| Macro F1 | 0.590 |
| Subset Accuracy | 0.165 |
| Hamming Loss | 0.092 |
Available Tags
The model can predict the following 20 tags:
c#, java, javascript, jquery, ios, .net, php, html, c++, iphone, android, objective-c, asp.net, python, sql, mysql, css, ajax, c, database
Usage
Quick Start
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("tag-recommender")
model = AutoModelForSequenceClassification.from_pretrained("tag-recommender")
# Example prediction
text = "How do I connect to a MySQL database using Python?"
inputs = tokenizer(text, return_tensors="pt", max_length=384, truncation=True)
# Get predictions
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.sigmoid(outputs.logits)
# Get predictions above threshold
threshold = 0.5
predicted_indices = (predictions > threshold).nonzero(as_tuple=True)[1]
print(f"Predicted tag indices: {predicted_indices.tolist()}")
Training Details
Training Configuration
- Epochs: 50
- Batch Size: 8
- Learning Rate: 5e-06
- Max Sequence Length: 330
- Optimizer: AdamW
- Loss Function: BCEWithLogitsLoss
Training Infrastructure
- Hardware: GPU (Google Colab)
- Framework: PyTorch + HuggingFace Transformers
Limitations
- Domain Specificity: Trained specifically on Stack Overflow data
- Language: English only
- Tag Coverage: Limited to most frequent tags in training data
- Context Length: Maximum input length of 330 tokens
Citation
@misc{tag-recommender,
title={Stack Overflow Tag Recommendation using DistilBERT},
year={2025},
howpublished={HuggingFace Model Repository},
url={https://huggingface.co/bonjourusman/tag-recommender}
}
Generated on: 2025-06-02 04:43:45
- Downloads last month
- 5
Model tree for bonjourusman/tag-recommender
Base model
distilbert/distilbert-base-uncasedEvaluation results
- Micro F1 on Stack Overflow Postsself-reported0.583
- Macro F1 on Stack Overflow Postsself-reported0.590
- Subset Accuracy on Stack Overflow Postsself-reported0.165
- Hamming Loss on Stack Overflow Postsself-reported0.092