Spaces:
Runtime error
Runtime error
| title: OpenAI Whisper Vs Alibaba SenseVoice Small | |
| emoji: ⚡ | |
| colorFrom: gray | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 5.31.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| short_description: Compare OpenAI Whisper against FunAudioLLM SenseVoice. | |
| # OpenAI Whisper vs. Alibaba SenseVoice Comparison | |
| This Space lets you compare **faster-whisper** models against Alibaba FunAudioLLM’s **SenseVoice** models for automatic speech recognition (ASR), featuring: | |
| * Multiple faster-whisper and SenseVoice model choices. | |
| * Language selection for each ASR engine (full list of language codes). | |
| * Explicit device selection (GPU or CPU) with ZeroGPU support (`spaces.GPU` decorator). | |
| * Speaker diarization with `pyannote.audio`, displaying speaker-labeled transcripts. | |
| * Simplified Chinese to Traditional Chinese conversion via `opencc`. | |
| * Color-coded and scrollable diarized transcript panel. | |
| * Semi‑streaming output: incremental transcript updates accumulate live as each segment or speaker turn completes. | |
| * Semi‑real‑time diarized transcription: speaker‑labeled segments appear incrementally as they finish processing. | |
| ## 🚀 How to Use | |
| 1. Upload an audio file or record from your microphone. | |
| 2. **Faster-Whisper ASR**: | |
| 1. Select a model variant from the dropdown. | |
| 2. Choose the transcription language (default: auto-detect). | |
| 3. Pick device: GPU or CPU. | |
| 4. Toggle diarization on/off. | |
| 5. Click **Transcribe with Faster-Whisper**. | |
| 3. **SenseVoice ASR**: | |
| 1. Select a SenseVoice model. | |
| 2. Choose the transcription language. | |
| 3. Pick device: GPU or CPU. | |
| 4. Toggle punctuation on/off. | |
| 5. Toggle diarization on/off. | |
| 6. Click **Transcribe with SenseVoice**. | |
| 4. View both the plain transcript and the color-coded, speaker-labeled diarized transcript side by side. | |
| ## 📁 Files | |
| * **app.py** | |
| Main Gradio app implementing dual ASR pipelines with device control, diarization, and Chinese conversion. | |
| * **requirements.txt** | |
| Python dependencies: Gradio, PyTorch, Transformers, faster-whisper, funasr, pyannote.audio, pydub, opencc-python-reimplemented, ctranslate2, termcolor, NVIDIA cuBLAS/cuDNN wheels. | |
| * **Dockerfile** (optional) | |
| Defines a CUDA 12 + cuDNN 9 environment for GPU acceleration. | |
| ## ⚠️ Notes | |
| * **Hugging Face token**: Set `HF_TOKEN` (or `HUGGINGFACE_TOKEN`) in Space secrets for authenticated diarization model access. | |
| * **GPU allocation**: GPU resources are acquired only when GPU is explicitly selected, thanks to the `spaces.GPU` decorator. | |
| * **Python version**: Python 3.10+ recommended. | |
| * **System `ffmpeg`**: Ensure `ffmpeg` is installed on the host (or via Dockerfile) for audio processing. | |
| ## 🛠️ Dependencies | |
| * **Python**: 3.10+ | |
| * **gradio** (>=3.39.0) | |
| * **torch** (>=2.0.0) & **torchaudio** | |
| * **transformers** (>=4.35.0) | |
| * **faster-whisper** (>=1.1.1) & **ctranslate2** (==4.5.0) | |
| * **funasr** (>=1.0.14) | |
| * **pyannote.audio** (>=2.1.1) & **huggingface-hub** (>=0.18.0) | |
| * **pydub** (>=0.25.1) & **ffmpeg-python** (>=0.2.0) | |
| * **opencc-python-reimplemented** | |
| * **termcolor** | |
| * **nvidia-cublas-cu12**, **nvidia-cudnn-cu12** | |
| ## License | |
| MIT | |
| --- | |
| ## 中文(臺灣)版本 | |
| # OpenAI Whisper vs. Alibaba FunASR SenseVoice 功能說明 | |
| 本 Space 同步比較 **faster-whisper** 與 Alibaba FunAudioLLM 的 **SenseVoice** 模型,提供以下特色: | |
| * 多款 faster-whisper 與 SenseVoice 模型可自由選擇 | |
| * 支援設定辨識語言(完整語言代碼列表) | |
| * 明確切換運算裝置 (GPU/CPU),並以 `spaces.GPU` 裝飾器延後 GPU 資源配置 | |
| * 整合 `pyannote.audio` 做語者分離,並在抄本中標示不同語者 | |
| * 使用 `opencc` 自動將簡體中文轉為臺灣繁體中文 | |
| * 彩色區隔對話式抄本,可捲動瀏覽及複製 | |
| * 半即時分段輸出:每段語音或語者片段處理完成後,即時累積顯示抄本 | |
| ## 🚀 使用步驟 | |
| 1. 上傳音檔或透過麥克風錄製音訊。 | |
| 2. **Faster-Whisper ASR**: | |
| 1. 選擇模型版本。 | |
| 2. 選定辨識語言 (預設自動偵測)。 | |
| 3. 切換運算裝置:GPU 或 CPU。 | |
| 4. 開啟/關閉語者分離功能。 | |
| 5. 點擊「Transcribe with Faster-Whisper」。 | |
| 3. **SenseVoice ASR**: | |
| 1. 選擇 SenseVoice 模型。 | |
| 2. 設定辨識語言。 | |
| 3. 切換運算裝置:GPU 或 CPU。 | |
| 4. 開啟/關閉標點符號功能。 | |
| 5. 開啟/關閉語者分離功能。 | |
| 6. 點擊「Transcribe with SenseVoice」。 | |
| 4. 左右並排查看純文字抄本與彩色標註的語者分離抄本。 | |
| ## 📁 檔案結構 | |
| * **app.py** | |
| Gradio 應用程式原始碼,實作雙 ASR 流程,包含運算裝置選擇、語者分離與中文轉換。 | |
| * **requirements.txt** | |
| Python 相依套件:Gradio、PyTorch、Transformers、faster-whisper、funasr、pyannote.audio、pydub、opencc-python-reimplemented、ctranslate2、termcolor、cuBLAS/cuDNN。 | |
| * **Dockerfile**(選用) | |
| 定義 CUDA 12 + cuDNN 9 的 Docker 環境。 | |
| ## ⚠️ 注意事項 | |
| * **Hugging Face 權杖**:請在 Space Secrets 設定 `HF_TOKEN` 或 `HUGGINGFACE_TOKEN`,以便下載語者分離模型。 | |
| * **GPU 分配**:僅於選擇 GPU 時才會申請 GPU 資源。 | |
| * **Python 版本**:建議使用 Python 3.10 以上。 | |
| * **系統 ffmpeg**:請確保主機或容器中已安裝 ffmpeg,以支援音訊處理。 | |
| ## 🛠️ 相依套件 | |
| * **Python**: 3.10+ | |
| * **gradio**: >=3.39.0 | |
| * **torch** & **torchaudio**: >=2.0.0 | |
| * **transformers**: >=4.35.0 | |
| * **faster-whisper**: >=1.1.1 & **ctranslate2**: ==4.5.0 | |
| * **funasr**: >=1.0.14 | |
| * **pyannote.audio**: >=2.1.1 & **huggingface-hub**: >=0.18.0 | |
| * **pydub**: >=0.25.1 & **ffmpeg-python**: >=0.2.0 | |
| * **opencc-python-reimplemented** | |
| * **termcolor** | |
| * **nvidia-cublas-cu12**, **nvidia-cudnn-cu12** | |
| ## 授權 | |
| MIT | |