# AI Engine Model Summary ## Simplified ASR-Only Configuration This engine has been simplified to use **ONLY** the IndicWav2Vec Hindi model for Automatic Speech Recognition (ASR). --- ## Active Model ### 1. IndicWav2Vec Hindi (Primary & Only Model) - **Model ID**: `ai4bharat/indicwav2vec-hindi` - **Type**: `Wav2Vec2ForCTC` - **Purpose**: Automatic Speech Recognition (ASR) for Hindi and Indian languages - **Status**: ✅ Active - Loaded at startup - **Location**: `detect_stuttering.py` lines 26, 148-156 - **Authentication**: Requires `HF_TOKEN` environment variable **Features:** - Speech-to-text transcription - Confidence scoring from model predictions - Text-based stutter analysis (simple repetition detection) --- ## Removed Models The following models have been **removed** to simplify the engine: 1. ❌ **MMS Language Identification (LID)** - `facebook/mms-lid-126` - Previously used for language detection - No longer needed - IndicWav2Vec handles Hindi natively 2. ❌ **Isolation Forest** (sklearn) - Previously used for anomaly detection - Removed - using simple text-based analysis instead --- ## Removed Libraries The following signal processing libraries are no longer used: - ❌ `parselmouth` (Praat) - Voice quality analysis - ❌ `fastdtw` - Repetition detection via DTW - ❌ `sklearn` - Machine learning algorithms - ❌ Complex acoustic feature extraction (MFCC, formants, etc.) --- ## Current Pipeline ``` Audio Input ↓ IndicWav2Vec Hindi ASR ↓ Text Transcription ↓ Basic Text Analysis ↓ Results (transcript + simple stutter detection) ``` --- ## API Response Format The simplified engine returns: ```json { "actual_transcript": "transcribed text", "target_transcript": "expected text (if provided)", "mismatched_chars": ["timestamps of low confidence regions"], "mismatch_percentage": 0.0, "ctc_loss_score": 0.0, "stutter_timestamps": [{"type": "repetition", "start": 0.0, "end": 0.5, ...}], "total_stutter_duration": 0.0, "stutter_frequency": 0.0, "severity": "none|mild|moderate|severe", "confidence_score": 0.8, "speaking_rate_sps": 0.0, "analysis_duration_seconds": 0.0, "model_version": "indicwav2vec-hindi-asr-v1" } ``` --- ## Dependencies **Required:** - `transformers` 4.35.0 - For IndicWav2Vec model - `torch` 2.0.1 - PyTorch backend - `librosa` ≥0.10.0 - Audio loading (16kHz resampling) - `numpy` - Array operations **Optional (for legacy methods, not used in ASR mode):** - `parselmouth` - Voice quality (not used) - `fastdtw` - DTW algorithm (not used) - `sklearn` - ML algorithms (not used) --- ## Usage ```python from diagnosis.ai_engine.detect_stuttering import get_stutter_detector detector = get_stutter_detector() result = detector.analyze_audio( audio_path="path/to/audio.wav", proper_transcript="expected text", # optional language="hindi" # default: hindi ) print(result['actual_transcript']) # ASR transcription ``` --- ## Notes - The engine focuses **only** on ASR transcription - Stutter detection is simplified to text-based repetition analysis - No complex acoustic feature extraction - Faster and lighter than the previous multi-model approach - Optimized for Hindi but can handle other Indian languages