Tags: #tts
SillyTavern/SillyTavern
SillyTavern is a locally installed, highly customizable user interface for interacting with various LLM and image generation APIs, empowering power users with extensive control over their AI interactions.
linyqh/NarratoAI
Leveraging AI models for one-click video commentary and editing, enabling efficient content creation.
heshengtao/comfyui_LLM_party
A ComfyUI-based framework for building comprehensive LLM agent workflows, integrating diverse AI models, tools, and social platforms.
FunAudioLLM/CosyVoice
CosyVoice is an advanced multi-lingual large language model-based text-to-speech system offering state-of-the-art voice generation, cloning, and full-stack deployment capabilities.
NVIDIA-NeMo/NeMo
A scalable generative AI framework for researchers and developers focused on Large Language Models, Multimodal, and Speech AI (ASR, TTS).
snakers4/silero-models
A collection of pre-trained, end-to-end text-to-speech models designed for simplicity, speed, and natural-sounding speech across multiple languages.
fishaudio/Bert-VITS2
An open-source text-to-speech model that combines the VITS2 backbone with multilingual BERT for high-quality, multi-language speech synthesis.
2noise/ChatTTS
A generative speech model optimized for natural, expressive dialogue in LLM assistants, featuring fine-grained prosodic control.
fishaudio/fish-speech
A state-of-the-art open-source multilingual text-to-speech system offering exceptionally natural, realistic, and emotionally rich voice generation.
rany2/edge-tts
Access Microsoft Edge's online text-to-speech service from Python without needing Edge, Windows, or an API key.
index-tts/index-tts
An industrial-level, zero-shot text-to-speech system offering precise duration control and disentangled emotional expression for highly natural and controllable speech synthesis.
Blaizzy/mlx-audio
An efficient audio processing library built on Apple's MLX framework, enabling fast text-to-speech, speech-to-text, and speech-to-speech capabilities on Apple Silicon devices.
LokerL/tts-vue
A cross-platform desktop application for Microsoft Edge text-to-speech synthesis, built with Electron and Vue.
jing332/tts-server-android
An advanced Android TTS application offering Microsoft TTS integration, custom HTTP requests, local engine support, dialogue recognition, and robust features like auto-retry and text replacement.
rhasspy/piper
A fast, local neural text-to-speech system for generating high-quality speech offline.
PeterH0323/Streamer-Sales
Streamer-Sales is an AI large language model designed for live streaming sales, generating compelling product descriptions and integrating advanced features like digital human generation, RAG, TTS, ASR, and Agent capabilities.
myshell-ai/MeloTTS
A high-quality, multi-lingual text-to-speech library supporting various languages and accents, optimized for real-time CPU inference.
netease-youdao/EmotiVoice
EmotiVoice is an open-source, multi-voice, and prompt-controlled text-to-speech engine supporting English and Chinese with emotional synthesis capabilities.
yl4579/StyleTTS2
StyleTTS 2 is a cutting-edge text-to-speech model achieving human-level speech synthesis through style diffusion and adversarial training with large speech language models.
metavoiceio/metavoice-src
MetaVoice-1B is an open-source 1.2B parameter foundational model for highly expressive, human-like text-to-speech synthesis with advanced voice cloning capabilities.
Plachtaa/VALL-E-X
An open-source implementation of Microsoft's VALL-E X, enabling zero-shot multilingual text-to-speech synthesis and voice cloning with emotion control.
jaywalnut310/vits
VITS is an end-to-end text-to-speech model that generates highly natural-sounding audio with diverse rhythms, outperforming traditional two-stage TTS systems.
supertone-inc/supertonic
Supertonic is a lightning-fast, on-device, multilingual text-to-speech system offering high-quality audio and privacy without cloud dependencies.
OpenMOSS/MOSS-TTS-Nano
MOSS-TTS-Nano is an open-source, multilingual, tiny speech generation model optimized for real-time CPU inference and lightweight integration.