Conversational AI Framework
10.5k 2026-05-06
TEN-framework/ten-framework
An open-source framework for building real-time, multimodal conversational AI agents with advanced features like voice assistance, diarization, and lip-sync.
Core Features
Real-time Multi-Purpose Voice Assistant with RTC/WebSocket support
Speaker Diarization for real-time speaker detection and labeling
Lip Sync Avatars integration with multiple vendors (e.g., Live2D, Trulience)
SIP Call integration for phone calls powered by AI agents
Hardware integration with ESP32-S3 Korvo V3 for embedded AI
Detailed Introduction
TEN is an open-source framework designed for developing real-time, multimodal conversational AI agents. It provides a robust foundation for creating sophisticated AI applications that can process and respond to voice and other modalities instantly. With features ranging from low-latency voice assistants and speaker diarization to lip-sync avatar integration and hardware support for ESP32, TEN empowers developers to build highly interactive and intelligent AI experiences across various platforms and use cases, from web-based applications to embedded systems.