Real-time Speech-to-Text Application
4.0k 2026-05-01
collabora/WhisperLive
A highly optimized, nearly-live speech-to-text application leveraging OpenAI's Whisper model for real-time audio transcription.
Core Features
Real-time transcription from live microphone input or pre-recorded audio files.
Supports multiple high-performance backends including Faster Whisper, NVIDIA TensorRT, and OpenVINO.
Provides an OpenAI REST interface for client-server communication.
Offers Docker support for easy deployment and GPU acceleration.
Configurable server parameters for client limits and connection durations.
Quick Start
pip install whisper-liveDetailed Introduction
WhisperLive is an advanced open-source project designed to deliver near real-time speech-to-text capabilities using OpenAI's powerful Whisper model. It addresses the need for efficient, low-latency audio transcription by integrating optimized inference backends like Faster Whisper, TensorRT, and OpenVINO, making it suitable for various hardware environments from CPUs to NVIDIA GPUs. This application enables users to transcribe live audio streams or pre-recorded files, offering a robust and scalable solution for real-time communication, accessibility, and data processing tasks.