AI Text-to-Speech API Service
4.7k 2026-04-18
remsky/Kokoro-FastAPI
A Dockerized FastAPI wrapper providing a high-performance, multi-platform (CPU/GPU) and multi-language API for the Kokoro-82M text-to-speech model, compatible with OpenAI's speech endpoint.
Core Features
OpenAI-compatible Speech API endpoint
Multi-language support (English, Japanese, Chinese)
NVIDIA GPU acceleration and CPU inference (PyTorch/ONNX)
Phoneme-based audio and per-word timestamped caption generation
Voice mixing with weighted combinations
Quick Start
docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:latestDetailed Introduction
Kokoro-FastAPI is an efficient, Dockerized solution that wraps the Kokoro-82M text-to-speech model with a FastAPI interface. It offers robust multi-language support and flexible inference options, leveraging both NVIDIA GPUs with PyTorch and CPU with ONNX. Designed for ease of deployment and integration, it provides an OpenAI-compatible speech endpoint, making it a powerful tool for developers seeking a self-hosted, high-performance TTS solution with advanced features like phoneme generation, captioning, and voice mixing.