AJ50 commited on
Commit
6eb130f
·
verified ·
1 Parent(s): 518bb33

Add encoder model with config and documentation

Browse files
Files changed (3) hide show
  1. Readme.md +28 -0
  2. config.json +23 -0
  3. encoder.pt +3 -0
Readme.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Encoder Model
2
+
3
+ This directory contains the pre-trained encoder model for voice conversion.
4
+
5
+ ## Model Details
6
+ - **File**: `encoder.pt`
7
+ - **Size**: ~17.1 MB
8
+ - **Input**: Audio waveform
9
+ - **Output**: Speaker embeddings
10
+
11
+ ## Usage
12
+ ```python
13
+ # Load the encoder model
14
+ encoder = torch.load('encoder.pt')
15
+ encoder.eval()
16
+
17
+ # Process audio
18
+ with torch.no_grad():
19
+ embedding = encoder(audio_tensor)
20
+ ```
21
+
22
+ ## Dependencies
23
+ - PyTorch
24
+ - NumPy
25
+ - Librosa (for audio processing)
26
+
27
+ ## Model Configuration
28
+ See `config.json` for model architecture and training parameters.
config.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "speaker_encoder",
3
+ "architecture": "LSTM",
4
+ "input_dim": 40,
5
+ "hidden_dim": 256,
6
+ "num_layers": 3,
7
+ "output_dim": 256,
8
+ "dropout": 0.1,
9
+ "sample_rate": 16000,
10
+ "window_size": 0.04,
11
+ "window_stride": 0.01,
12
+ "n_mels": 40,
13
+ "embedding_size": 256,
14
+ "prenet_dims": [256, 256],
15
+ "lstm_dims": 256,
16
+ "num_lstm_layers": 3,
17
+ "speaker_embedding_size": 256,
18
+ "use_cuda": true,
19
+ "model_name": "speaker_encoder",
20
+ "version": "1.0",
21
+ "authors": ["Arjit"],
22
+ "description": "Speaker encoder model for voice conversion tasks"
23
+ }
encoder.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:39373b86598fa3da9fcddee6142382efe09777e8d37dc9c0561f41f0070f134e
3
+ size 17090379