README: How to Use This Tool
1. Load models
Click “Load Models” and wait for all models to finish loading. Note that WhisperX takes the longest to initialize, so please be patient.
2. Upload input audio
Click “Input Audio” and upload the audio file you want to edit.
3. Transcribe and correct text
Click “Transcribe” to perform speech recognition. If the transcription is inaccurate, edit the text in “Original transcript”, then click “ReAlign” to recompute word-level timestamps.
4. (Optional) Denoise noisy audio
If the input audio is noisy and affects recognition or synthesis quality, click “Denoise” to apply noise reduction. If you are not satisfied with the denoised result, click “Cancel Denoise” to restore the original audio, or switch to a different denoiser under “Select models” and reload.
5. Select the edit span
Use “First word to edit” and “Last word to edit” to specify the region to modify, then click “Check edit words” to preview the selection. For finer control, you may also adjust “Edit from time” and “Edit to time”.
6. Enter the new text
In the “Text” box, enter the text that should replace the selected segment.
7. Run the edit
Click “Run” and wait for the model to generate the edited audio.
8. Inspect the result
The edited waveform will appear in “Output Audio”, and the corresponding edited text will be shown under “Inference transcript”.
9. Refine or change models
If the result is not satisfactory, try adjusting the “Generation Parameters” or selecting a different “Edit Model” under “Select models”, then run again.
10. Feedback
For bug reports or feature requests, feel free to:
1) Open a GitHub issue
2) Post on the Hugging Face community page
3) Contact us via email at approximetal@gmail.com