DistilBERT Task Router - Query Classification Model (V5)
A high-performance intent classification model based on DistilBERT, fine-tuned to classify user queries into 5 categories with 98.03% accuracy on a challenging test set of 7,320 samples.
Model Description
- Base Model: distilbert-base-uncased (66M parameters)
- Task: Multi-class text classification (5 categories)
- Language: English
- Training Data: 58,560 samples (custom generated)
- Test Accuracy: 98.03% โ
- Inference Speed: ~3ms average latency
Categories
This model classifies text into 5 intent categories:
basic_actions - One-time, immediate commands
- Examples: "Turn on the lights", "Set temperature to 22 degrees", "Play music"
automator - Recurring, scheduled, or conditional automations
- Examples: "Turn on lights every day at 6pm", "AC on if temperature > 28", "Every morning at 8am, start coffee"
information - Educational, factual, or informational queries
- Examples: "What is quantum computing?", "How does photosynthesis work?", "What's the weather?"
conversation - Social interactions and casual chat
- Examples: "Hello", "How are you?", "Good morning", "Nice to meet you"
irrelevant - Abusive, meaningless, or off-topic content
- Examples: "asdfghjkl", "You're stupid", "Random gibberish"
Performance
Test Set Results (7,320 samples)
| Category | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| basic_actions | 95.92% | 100.00% | 97.92% | 1,833 |
| automator | 100.00% | 94.50% | 97.17% | 1,418 |
| information | 100.00% | 95.39% | 97.64% | 1,432 |
| conversation | 100.00% | 100.00% | 100.00% | 1,456 |
| irrelevant | 94.71% | 100.00% | 97.28% | 1,181 |
| Overall | 98.12% | 98.03% | 98.03% | 7,320 |
Key Metrics
- Accuracy: 98.03%
- F1 Score (Weighted): 98.03%
- F1 Score (Macro): 98.00%
- Error Rate: 1.97% (144 errors / 7,320 samples)
Latency
- Average: 2.91ms
- Median: 2.80ms
- P95: 3.36ms
- P99: 3.88ms
Usage
Quick Start
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
model_name = "SaiCharan7829/query_classification-distilBERT-66M"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Prepare input
text = "Turn on the lights every evening at 6pm"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
# Get prediction
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
predicted_class = torch.argmax(logits, dim=1).item()
# Categories mapping
categories = ["basic_actions", "automator", "information", "conversation", "irrelevant"]
print(f"Predicted category: {categories[predicted_class]}")
# Output: Predicted category: automator
With Confidence Scores
import torch.nn.functional as F
# Get probabilities
probs = F.softmax(logits, dim=1)[0]
confidence = probs[predicted_class].item()
print(f"Category: {categories[predicted_class]}")
print(f"Confidence: {confidence:.2%}")
# Show all probabilities
for i, category in enumerate(categories):
print(f"{category}: {probs[i].item():.2%}")
Training Details
Training Hyperparameters
- Epochs: 30
- Batch Size: 64 (effective, with gradient accumulation)
- Learning Rate: 2e-5
- Warmup Steps: 500
- Weight Decay: 0.01
- Label Smoothing: 0.1
- Learning Rate Schedule: Cosine with warmup
- Optimizer: AdamW
- Class Weights: Applied (automator: 1.31x, basic_actions: 1.48x, irrelevant: 0.98x)
Dataset
- Training Samples: 58,560
- Validation Samples: 7,320
- Test Samples: 7,320
- Data Split: 80% / 10% / 10%
Distribution:
- basic_actions: 24.4% (15,000 samples with 40% short commands)
- automator: 19.8%
- information: 19.7%
- conversation: 19.8%
- irrelevant: 16.4%
Training Infrastructure
- Framework: Transformers 4.x, PyTorch 2.x
- Device: Apple Silicon (MPS)
- Precision: FP32
Limitations & Biases
- The model is trained on English text only
- Performance may degrade on domain-specific jargon not seen during training
- Short ambiguous commands (1-2 words) may have lower confidence
- The "irrelevant" category includes abusive content, which may reflect biases in training data
Intended Use
This model is designed for:
- Smart home assistants and IoT platforms
- Chatbot intent classification
- Task routing and workflow automation
- Virtual assistant command parsing
Not recommended for:
- Sensitive content moderation (use dedicated safety models)
- Medical or legal decision-making
- Financial advice classification
Version History
v5 (Current) - November 2024
- Accuracy: 98.03% (test set)
- Major improvements to basic_actions recall (100%)
- Optimized class weights based on error analysis
- Enhanced dataset with better short command coverage
v4
- Accuracy: 94.86% (test set)
- Initial release with 72k training samples
- Identified issues with short command classification
Citation
@misc{query_classification_distilbert_2024,
author = {SaiCharan7829},
title = {DistilBERT Task Router - Query Classification Model},
year = {2024},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/SaiCharan7829/query_classification-distilBERT-66M}}
}
License
Apache 2.0
Model Card Authors
SaiCharan7829
- Downloads last month
- 4
Evaluation results
- Test Accuracyself-reported98.030
- F1 Score (Weighted)self-reported98.030