WhisperSpeech/WhisperSpeech
An open-source, high-performance text-to-speech system built on Whisper, aiming to be a hackable and commercially safe alternative for speech generation.
Core Features
Detailed Introduction
WhisperSpeech is an open-source text-to-speech (TTS) system engineered by "inverting" OpenAI's Whisper model. Its core mission is to deliver a powerful, customizable, and commercially safe platform for speech generation, mirroring Stable Diffusion's impact on image synthesis. The project boasts high-speed audio generation, sophisticated voice cloning capabilities, and growing multilingual support with seamless code-switching. Utilizing a two-stage, token-based pipeline with Whisper, EnCodec, and Vocos, WhisperSpeech offers a robust and flexible solution for developers and researchers seeking an open alternative to proprietary TTS services.