Tags: #tts
SillyTavern/SillyTavern
A powerful, locally installed user interface for interacting with various LLM APIs, image generation engines, and TTS models, offering extensive customization and control.
heshengtao/comfyui_LLM_party
A ComfyUI extension providing a comprehensive LLM agent framework for building custom AI assistants, integrating diverse AI models, and automating complex workflows.
FunAudioLLM/CosyVoice
CosyVoice is an advanced multi-lingual large language model-based text-to-speech system offering state-of-the-art voice generation, cloning, and full-stack deployment capabilities.
NVIDIA-NeMo/NeMo
A scalable generative AI framework for building, customizing, and deploying models focused on Large Language Models, Multimodal, and Speech AI (ASR, TTS).
OpenBMB/VoxCPM
A tokenizer-free, multilingual Text-to-Speech system offering advanced voice design, controllable cloning, and high-quality audio output.
snakers4/silero-models
Silero Models offers a collection of pre-trained, end-to-end text-to-speech models designed for simplicity, speed, and natural-sounding speech generation.
PaddlePaddle/PaddleSpeech
An open-source, easy-to-use speech toolkit built on PaddlePaddle, offering state-of-the-art models for various speech and audio tasks.
fishaudio/Bert-VITS2
An open-source Text-to-Speech system built on the VITS2 backbone, enhanced with multilingual BERT for improved speech synthesis.
AIDC-AI/Pixelle-Video
An AI-powered engine that fully automates short video creation from a single topic, handling script, visuals, voiceover, and music without editing skills.
2noise/ChatTTS
A generative speech model optimized for natural and expressive daily dialogue, especially for LLM assistants.
fishaudio/fish-speech
A state-of-the-art open-source multilingual text-to-speech system offering natural, expressive, and emotionally rich voice generation.
rany2/edge-tts
A Python module and CLI tool to access Microsoft Edge's online text-to-speech service without an API key, Edge browser, or Windows.
index-tts/index-tts
IndexTTS2 is an industrial-level, zero-shot text-to-speech system offering precise duration control and disentangled emotional expression for highly natural and controllable speech synthesis.
remsky/Kokoro-FastAPI
A Dockerized FastAPI wrapper providing a high-performance, multi-platform (CPU/GPU) and multi-language API for the Kokoro-82M text-to-speech model, compatible with OpenAI's speech endpoint.
canopyai/Orpheus-TTS
A state-of-the-art open-source text-to-speech system leveraging LLMs to generate human-like, emotional, and low-latency speech with zero-shot voice cloning capabilities.
abus-aikorea/voice-pro
A powerful AI-powered web application for comprehensive multimedia content creation, offering advanced speech recognition, voice cloning, multilingual TTS, and YouTube video processing.
jing332/tts-server-android
An advanced Android Text-to-Speech (TTS) application offering Microsoft TTS integration, custom HTTP requests, local engine support, and intelligent dialogue recognition.
rhasspy/piper
A fast, local, neural text-to-speech system for efficient and private voice generation.
PeterH0323/Streamer-Sales
An AI-powered large language model designed to generate compelling product descriptions and sales pitches, enhancing live streaming and e-commerce sales.
myshell-ai/MeloTTS
A high-quality, multi-lingual text-to-speech library supporting real-time CPU inference across various languages and accents.
netease-youdao/EmotiVoice
An open-source, multi-voice, and prompt-controlled text-to-speech engine capable of generating speech with diverse emotions in English and Chinese.
yl4579/StyleTTS2
StyleTTS 2 is a text-to-speech model that achieves human-level speech synthesis by leveraging style diffusion and adversarial training with large speech language models.
metavoiceio/metavoice-src
MetaVoice-1B is an open-source, 1.2B parameter foundational model for highly expressive, human-like text-to-speech synthesis and zero-shot voice cloning.
Plachtaa/VALL-E-X
An open-source implementation of Microsoft's VALL-E X, enabling zero-shot multilingual text-to-speech synthesis and voice cloning with emotion control.
jaywalnut310/vits
VITS is an end-to-end text-to-speech model that generates highly natural-sounding audio with diverse rhythms, outperforming traditional two-stage TTS systems.