metavoiceio/metavoice-src
MetaVoice-1B is an open-source 1.2B parameter foundational model for highly expressive, human-like text-to-speech synthesis with advanced voice cloning capabilities.
Core Features
Quick Start
docker-compose up -d uiDetailed Introduction
MetaVoice-1B is a groundbreaking 1.2 billion parameter open-source model designed for state-of-the-art text-to-speech (TTS). Trained on 100,000 hours of speech, it excels in producing highly expressive, human-like English speech with natural emotional rhythm and tone. A key feature is its robust zero-shot voice cloning for American and British accents, requiring only 30 seconds of reference audio. Furthermore, it supports cross-lingual voice cloning via finetuning, demonstrating success with as little as one minute of training data for new speakers. Released under the Apache 2.0 license, MetaVoice-1B offers a powerful, unrestricted solution for developers and researchers seeking advanced, customizable speech synthesis capabilities.