DistilBERT Task Router - Query Classification Model (V5)

A high-performance intent classification model based on DistilBERT, fine-tuned to classify user queries into 5 categories with 98.03% accuracy on a challenging test set of 7,320 samples.

Model Description

  • Base Model: distilbert-base-uncased (66M parameters)
  • Task: Multi-class text classification (5 categories)
  • Language: English
  • Training Data: 58,560 samples (custom generated)
  • Test Accuracy: 98.03% โœ“
  • Inference Speed: ~3ms average latency

Categories

This model classifies text into 5 intent categories:

  1. basic_actions - One-time, immediate commands

    • Examples: "Turn on the lights", "Set temperature to 22 degrees", "Play music"
  2. automator - Recurring, scheduled, or conditional automations

    • Examples: "Turn on lights every day at 6pm", "AC on if temperature > 28", "Every morning at 8am, start coffee"
  3. information - Educational, factual, or informational queries

    • Examples: "What is quantum computing?", "How does photosynthesis work?", "What's the weather?"
  4. conversation - Social interactions and casual chat

    • Examples: "Hello", "How are you?", "Good morning", "Nice to meet you"
  5. irrelevant - Abusive, meaningless, or off-topic content

    • Examples: "asdfghjkl", "You're stupid", "Random gibberish"

Performance

Test Set Results (7,320 samples)

Category Precision Recall F1-Score Support
basic_actions 95.92% 100.00% 97.92% 1,833
automator 100.00% 94.50% 97.17% 1,418
information 100.00% 95.39% 97.64% 1,432
conversation 100.00% 100.00% 100.00% 1,456
irrelevant 94.71% 100.00% 97.28% 1,181
Overall 98.12% 98.03% 98.03% 7,320

Key Metrics

  • Accuracy: 98.03%
  • F1 Score (Weighted): 98.03%
  • F1 Score (Macro): 98.00%
  • Error Rate: 1.97% (144 errors / 7,320 samples)

Latency

  • Average: 2.91ms
  • Median: 2.80ms
  • P95: 3.36ms
  • P99: 3.88ms

Usage

Quick Start

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "SaiCharan7829/query_classification-distilBERT-66M"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Prepare input
text = "Turn on the lights every evening at 6pm"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

# Get prediction
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    predicted_class = torch.argmax(logits, dim=1).item()

# Categories mapping
categories = ["basic_actions", "automator", "information", "conversation", "irrelevant"]
print(f"Predicted category: {categories[predicted_class]}")
# Output: Predicted category: automator

With Confidence Scores

import torch.nn.functional as F

# Get probabilities
probs = F.softmax(logits, dim=1)[0]
confidence = probs[predicted_class].item()

print(f"Category: {categories[predicted_class]}")
print(f"Confidence: {confidence:.2%}")

# Show all probabilities
for i, category in enumerate(categories):
    print(f"{category}: {probs[i].item():.2%}")

Training Details

Training Hyperparameters

  • Epochs: 30
  • Batch Size: 64 (effective, with gradient accumulation)
  • Learning Rate: 2e-5
  • Warmup Steps: 500
  • Weight Decay: 0.01
  • Label Smoothing: 0.1
  • Learning Rate Schedule: Cosine with warmup
  • Optimizer: AdamW
  • Class Weights: Applied (automator: 1.31x, basic_actions: 1.48x, irrelevant: 0.98x)

Dataset

  • Training Samples: 58,560
  • Validation Samples: 7,320
  • Test Samples: 7,320
  • Data Split: 80% / 10% / 10%

Distribution:

  • basic_actions: 24.4% (15,000 samples with 40% short commands)
  • automator: 19.8%
  • information: 19.7%
  • conversation: 19.8%
  • irrelevant: 16.4%

Training Infrastructure

  • Framework: Transformers 4.x, PyTorch 2.x
  • Device: Apple Silicon (MPS)
  • Precision: FP32

Limitations & Biases

  • The model is trained on English text only
  • Performance may degrade on domain-specific jargon not seen during training
  • Short ambiguous commands (1-2 words) may have lower confidence
  • The "irrelevant" category includes abusive content, which may reflect biases in training data

Intended Use

This model is designed for:

  • Smart home assistants and IoT platforms
  • Chatbot intent classification
  • Task routing and workflow automation
  • Virtual assistant command parsing

Not recommended for:

  • Sensitive content moderation (use dedicated safety models)
  • Medical or legal decision-making
  • Financial advice classification

Version History

v5 (Current) - November 2024

  • Accuracy: 98.03% (test set)
  • Major improvements to basic_actions recall (100%)
  • Optimized class weights based on error analysis
  • Enhanced dataset with better short command coverage

v4

  • Accuracy: 94.86% (test set)
  • Initial release with 72k training samples
  • Identified issues with short command classification

Citation

@misc{query_classification_distilbert_2024,
  author = {SaiCharan7829},
  title = {DistilBERT Task Router - Query Classification Model},
  year = {2024},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/SaiCharan7829/query_classification-distilBERT-66M}}
}

License

Apache 2.0

Model Card Authors

SaiCharan7829

Downloads last month
4
Safetensors
Model size
67M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results