Fish Agent

This demo is the Fish Audio self-developed end-to-end language model Fish Agent 3B version.
You can find the code and weights in our official repository, but all related content is released under the CC BY-NC-SA 4.0 license.
The demo is an early beta version, and inference speed is yet to be optimized.

Features

This model automatically integrates ASR and TTS components, requiring no external models, making it truly end-to-end rather than a three-stage process (ASR+LLM+TTS).
The model can use reference audio to control speaking voice.
It can generate audio with strong emotions and prosody.