Text-to-Speech (TTS) Model
8.7k 2026-05-01
fishaudio/Bert-VITS2
An open-source text-to-speech model that combines the VITS2 backbone with multilingual BERT for high-quality, multi-language speech synthesis.
Core Features
Utilizes VITS2 architecture for robust speech generation.
Integrates multilingual BERT for improved linguistic understanding across languages.
Capable of synthesizing high-quality, natural-sounding speech.
Open-source and community-referenced, building upon established TTS research.
Detailed Introduction
Bert-VITS2 is an open-source project designed for advanced text-to-speech synthesis, leveraging the robust VITS2 architecture and integrating multilingual BERT for enhanced linguistic processing. This combination aims to produce high-quality, natural-sounding speech across various languages, drawing inspiration from projects like MassTTS. While it offers a powerful foundation for speech generation, the project is currently not actively maintained, with its developers recommending the newer Fish-Speech project as a state-of-the-art alternative for continued development and superior performance.