OpenBMB/VoxCPM
VoxCPM2 is a tokenizer-free, 2B-parameter Text-to-Speech system supporting 30 languages, creative voice design, and controllable voice cloning with 48kHz studio-quality audio output.
Core Features
Detailed Introduction
VoxCPM2 is an advanced, open-source Text-to-Speech system that revolutionizes speech generation by employing a tokenizer-free, diffusion autoregressive architecture. Trained on over 2 million hours of multilingual data, this 2B-parameter model offers unparalleled naturalness and expressiveness across 30 languages. Its standout features include creating new voices from text descriptions, precise voice cloning with style guidance, and producing studio-quality 48kHz audio. Designed for real-time performance and commercial readiness, VoxCPM2 provides a comprehensive solution for diverse speech synthesis needs.