Update README.md
Browse files
README.md
CHANGED
|
@@ -158,6 +158,43 @@ language:
|
|
| 158 |
|
| 159 |
XEUS tops the [ML-SUPERB]() multilingual speech recognition leaderboard, outperforming [MMS](), [w2v-BERT 2.0](), and [XLS-R](). XEUS also sets a new state-of-the-art on 4 tasks in the monolingual [SUPERB]() benchmark.
|
| 160 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 161 |
## Results
|
| 162 |
|
| 163 |

|
|
|
|
| 158 |
|
| 159 |
XEUS tops the [ML-SUPERB]() multilingual speech recognition leaderboard, outperforming [MMS](), [w2v-BERT 2.0](), and [XLS-R](). XEUS also sets a new state-of-the-art on 4 tasks in the monolingual [SUPERB]() benchmark.
|
| 160 |
|
| 161 |
+
## Requirements
|
| 162 |
+
|
| 163 |
+
The code for XEUS is still in progress of being merged into the main ESPnet repo. It can instead be used from the following fork:
|
| 164 |
+
|
| 165 |
+
```
|
| 166 |
+
pip install -e git+git://github.com/wanchichen/espnet.git@ssl
|
| 167 |
+
```
|
| 168 |
+
|
| 169 |
+
XEUS supports [Flash Attention], which can be installed as follows:
|
| 170 |
+
|
| 171 |
+
```
|
| 172 |
+
pip install flash-attn --no-build-isolation
|
| 173 |
+
```
|
| 174 |
+
|
| 175 |
+
## Usage
|
| 176 |
+
|
| 177 |
+
```
|
| 178 |
+
from torch.nn.utils.rnn import pad_sequence
|
| 179 |
+
from espnet2.tasks.ssl import SSLTask
|
| 180 |
+
import soundfile as sf
|
| 181 |
+
|
| 182 |
+
device = "cuda" if torch.cuda.is_available() else "cpu"
|
| 183 |
+
|
| 184 |
+
xeus_model, xeus_train_args = SSLTask.build_model_from_file(
|
| 185 |
+
config = None,
|
| 186 |
+
ckpt = '/path/to/checkpoint/here/checkpoint.pth',
|
| 187 |
+
device,
|
| 188 |
+
)
|
| 189 |
+
|
| 190 |
+
wavs, sampling_rate = sf.read('/path/to/audio.wav') # sampling rate should be 16000
|
| 191 |
+
wav_lengths = torch.LongTensor([len(wav) for wav in [wavs]]).to(device)
|
| 192 |
+
wavs = pad_sequence([wavs], batch_first=True).to(device)
|
| 193 |
+
|
| 194 |
+
# we recommend use_mask=True during fine-tuning
|
| 195 |
+
feats = xeus_model.encode(wavs, wav_lengths, use_mask=False, use_final_output=False)[0][-1] # take the output of the last layer
|
| 196 |
+
```
|
| 197 |
+
|
| 198 |
## Results
|
| 199 |
|
| 200 |

|