FunAudioLLM/CosyVoice
CosyVoice is an advanced multi-lingual large language model-based text-to-speech system offering state-of-the-art voice generation, cloning, and full-stack deployment capabilities.
Core Features
Quick Start
git clone --recursive https://github.com/FunAudioLLM/CosyVoice.gitDetailed Introduction
CosyVoice is a cutting-edge AI text-to-speech (TTS) system built upon large language models, designed for high-quality, zero-shot multilingual speech synthesis. It excels in generating natural-sounding voices with remarkable content consistency and speaker similarity across 9 major languages and numerous Chinese dialects. The project provides comprehensive inference, training, and deployment tools, making it suitable for production environments. Its advanced features like pronunciation inpainting, text normalization, and bi-streaming for low-latency output position CosyVoice as a robust solution for diverse voice generation needs, potentially replacing traditional commercial TTS services.