AI Text-to-Speech (TTS) Model
4.2k 2026-04-18
metavoiceio/metavoice-src
MetaVoice-1B is an open-source, 1.2B parameter foundational model for highly expressive, human-like text-to-speech synthesis and zero-shot voice cloning.
Core Features
Generates emotional speech rhythm and tone in English.
Supports zero-shot voice cloning for American & British voices with 30s reference audio.
Enables cross-lingual voice cloning through finetuning with minimal data.
Synthesizes arbitrary length text efficiently.
Quick Start
docker-compose up -d uiDetailed Introduction
MetaVoice-1B is a significant open-source initiative providing a 1.2 billion parameter text-to-speech model, trained on 100,000 hours of speech data. It excels in generating emotionally rich and natural-sounding English speech, offering advanced capabilities like zero-shot voice cloning from short audio samples and adaptable cross-lingual voice cloning through finetuning. Released under Apache 2.0, it empowers developers and creators with a powerful, unrestricted tool for high-quality, scalable voice synthesis across various applications.