On-device Multimodal AI Application
1.6k 2026-04-18
fikrikarim/parlor
Parlor is an on-device, real-time multimodal AI that enables natural voice and vision conversations, running entirely on your local machine.
Core Features
Real-time, on-device multimodal AI (voice and vision)
Powered by Google's Gemma 4 E2B and Hexgrad's Kokoro TTS
Hands-free voice activity detection (VAD) and barge-in capability
Sentence-level text-to-speech (TTS) streaming for rapid responses
Supports macOS (Apple Silicon) and Linux (GPU) with minimal RAM
Quick Start
uv run server.pyDetailed Introduction
Parlor addresses the challenge of making advanced AI accessible and free by enabling real-time, multimodal AI conversations directly on user devices. Leveraging small yet powerful models like Gemma 4 E2B for understanding and Kokoro for text-to-speech, it eliminates server costs and privacy concerns. This project is a game-changer for applications like language learning, offering a local, responsive AI companion that can interact through voice and camera, mirroring capabilities previously seen only in high-end commercial AI services.