Text-to-Speech API Server
4.8k 2026-05-01
remsky/Kokoro-FastAPI
A Dockerized FastAPI wrapper for the Kokoro-82M text-to-speech model, offering multi-language support, CPU/GPU inference, and an OpenAI-compatible API.
Core Features
OpenAI-compatible Speech endpoint for easy integration.
Multi-language support including English, Japanese, and Chinese.
Optimized inference with NVIDIA GPU (PyTorch) or CPU (ONNX).
Phoneme-based audio generation and per-word timestamped captions.
Dockerized deployment with pre-built images for quick setup.
Quick Start
docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:latestDetailed Introduction
Kokoro-FastAPI provides a high-performance, scalable text-to-speech solution by wrapping the Kokoro-82M model within a Dockerized FastAPI application. It addresses the need for flexible TTS services with its multi-language capabilities and efficient inference across both CPU and GPU architectures. The project simplifies integration through an OpenAI-compatible API, making it an ideal choice for developers building applications requiring advanced speech synthesis, including features like voice mixing and detailed caption generation.