On-device Multimodal AI Application
1.6k 2026-04-18

fikrikarim/parlor

Parlor is an on-device, real-time multimodal AI that enables natural voice and vision conversations, running entirely on your local machine.

Core Features

Real-time, on-device multimodal AI (voice and vision)
Powered by Google's Gemma 4 E2B and Hexgrad's Kokoro TTS
Hands-free voice activity detection (VAD) and barge-in capability
Sentence-level text-to-speech (TTS) streaming for rapid responses
Supports macOS (Apple Silicon) and Linux (GPU) with minimal RAM

Quick Start

uv run server.py

Detailed Introduction

Parlor addresses the challenge of making advanced AI accessible and free by enabling real-time, multimodal AI conversations directly on user devices. Leveraging small yet powerful models like Gemma 4 E2B for understanding and Kokoro for text-to-speech, it eliminates server costs and privacy concerns. This project is a game-changer for applications like language learning, offering a local, responsive AI companion that can interact through voice and camera, mirroring capabilities previously seen only in high-end commercial AI services.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.