--- license: cc-by-4.0 language: - en - it - pt - de - fr - es - ja - zh tags: - automatic-speech-recognition - speech - audio - Transformer - flow-matching - discrete-flow-matching - pytorch - hf-asr-leaderboard --- # Drax: Speech Recognition with Discrete Flow Matching ## Model Overview The Drax model family provides speech recognition models based on discrete flow matching. The `drax-v1` model supports eight languages: English, Spanish, French, Portuguese, German, Italian, Japanese and Chinese. It is an encoder-decoder model consists of a Whisper-large-v3 encoder, and a DiT based decoder, with a total of ~1.2B parameters. More details on usage in our GitHub repo, [https://github.com/aiola-lab/drax](https://github.com/aiola-lab/drax) and our [paper](https://arxiv.org/abs/2510.04162). ## Usage See [https://github.com/aiola-lab/drax](https://github.com/aiola-lab/drax) for installation instructions. ```python from drax import Transcriber asr = Transcriber(model_path="aiola/drax-v1") result = asr.transcribe("/path/to/audio.wav", language="en") print(result[0].transcript) ``` Control sampling steps, temperature etc. ```python from drax import Transcriber asr = Transcriber(model_path="aiola/drax-v1") result = asr.transcribe("/path/to/audio.wav", language="en", sampling_steps=32, temperature=1e-2) print(result[0].transcript) ``` Batch inference: ```python from drax import Transcriber asr = Transcriber(model_path="aiola/drax-v1") audio_paths = ["/path/to/audio1.wav", "/path/to/audio2.wav"] languages = ["en", "de"] result = asr.transcribe(audio_paths, language=languages) print(result.transcript) ``` ## Citation ```bibtex @article{navon2025drax, title={Drax: Speech Recognition with Discrete Flow Matching}, author={Navon, Aviv and Shamsian, Aviv and Glazer, Neta and Segal-Feldman, Yael and Hetz, Gill and Keshet, Joseph and Fetaya, Ethan}, journal={arXiv preprint arXiv:2510.04162}, year={2025} } ```