Deep Learning Framework / AI Tool
59.6k 2026-04-18
CorentinJ/Real-Time-Voice-Cloning
Clone a voice in 5 seconds to generate arbitrary speech in real-time using a three-stage deep learning framework.
Core Features
Real-time voice cloning from short audio samples (5 seconds).
Text-to-speech synthesis using cloned voices.
Three-stage deep learning architecture (SV2TTS).
Provides both a GUI toolbox and a command-line interface.
Supports Windows and Linux with NVIDIA GPU or CPU execution.
Quick Start
uv run --extra cuda demo_toolbox.pyDetailed Introduction
This project implements SV2TTS, a three-stage deep learning framework for real-time voice cloning and text-to-speech synthesis. It enables users to create a digital representation of a voice from just a few seconds of audio, then utilize this reference to generate arbitrary speech. Built upon advanced research papers, it provides a user-friendly toolbox with both GUI and command-line interfaces, supporting Windows and Linux, and optimizing performance with NVIDIA GPUs or running on CPUs. While acknowledging the rapid evolution in deep learning, it offers a robust open-source solution for voice generation.