AI-powered Text-to-Speech System
30.0k 2026-05-01
fishaudio/fish-speech
A state-of-the-art open-source multilingual text-to-speech system offering exceptionally natural, realistic, and emotionally rich voice generation.
Core Features
State-of-the-art multilingual TTS with 80+ languages.
Dual-Autoregressive architecture with reinforcement learning alignment.
Fine-grained prosody and emotion control using natural language tags.
Native support for multi-speaker and multi-turn conversation generation.
Trained on over 10 million hours of audio data.
Detailed Introduction
Fish Speech is an advanced open-source text-to-speech (TTS) system, featuring the S2 Pro model. It redefines voice generation with its Dual-Autoregressive architecture and RL alignment, producing exceptionally natural and emotionally rich speech. Trained on over 10 million hours across 80+ languages, it offers sub-word level control of prosody and emotion via natural language tags, alongside native support for multi-speaker and multi-turn conversations, positioning it as a leader in both open-source and commercial TTS solutions.