Add encoder model with config and documentation

Files changed (3) hide show

Readme.md ADDED Viewed

+# Encoder Model
+This directory contains the pre-trained encoder model for voice conversion.
+## Model Details
+- **File**: `encoder.pt`
+- **Size**: ~17.1 MB
+- **Input**: Audio waveform
+- **Output**: Speaker embeddings
+## Usage
+```python
+# Load the encoder model
+encoder = torch.load('encoder.pt')
+encoder.eval()
+# Process audio
+with torch.no_grad():
+    embedding = encoder(audio_tensor)
+```
+## Dependencies
+- PyTorch
+- NumPy
+- Librosa (for audio processing)
+## Model Configuration
+See `config.json` for model architecture and training parameters.

config.json ADDED Viewed

+{
+  "model_type": "speaker_encoder",
+  "architecture": "LSTM",
+  "input_dim": 40,
+  "hidden_dim": 256,
+  "num_layers": 3,
+  "output_dim": 256,
+  "dropout": 0.1,
+  "sample_rate": 16000,
+  "window_size": 0.04,
+  "window_stride": 0.01,
+  "n_mels": 40,
+  "embedding_size": 256,
+  "prenet_dims": [256, 256],
+  "lstm_dims": 256,
+  "num_lstm_layers": 3,
+  "speaker_embedding_size": 256,
+  "use_cuda": true,
+  "model_name": "speaker_encoder",
+  "version": "1.0",
+  "authors": ["Arjit"],
+  "description": "Speaker encoder model for voice conversion tasks"
+}

encoder.pt ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:39373b86598fa3da9fcddee6142382efe09777e8d37dc9c0561f41f0070f134e
+size 17090379