OSS Alternative - Discover Top Open Source Alternatives to Popular Software

metavoiceio/metavoice-src

MetaVoice-1B is an open-source 1.2B parameter foundational model for highly expressive, human-like text-to-speech synthesis with advanced voice cloning capabilities.

Core Features

Generates emotional speech rhythm and tone in English.

Supports zero-shot voice cloning for American & British voices with minimal audio.

Enables cross-lingual voice cloning through finetuning with limited data.

Synthesizes text of arbitrary length efficiently.

Quick Start

docker-compose up -d ui

Detailed Introduction

MetaVoice-1B is a groundbreaking 1.2 billion parameter open-source model designed for state-of-the-art text-to-speech (TTS). Trained on 100,000 hours of speech, it excels in producing highly expressive, human-like English speech with natural emotional rhythm and tone. A key feature is its robust zero-shot voice cloning for American and British accents, requiring only 30 seconds of reference audio. Furthermore, it supports cross-lingual voice cloning via finetuning, demonstrating success with as little as one minute of training data for new speakers. Released under the Apache 2.0 license, MetaVoice-1B offers a powerful, unrestricted solution for developers and researchers seeking advanced, customizable speech synthesis capabilities.