LLM Inference Server & Desktop Utility
11.7k 2026-04-27
jundot/omlx
An LLM inference server optimized for Apple Silicon, featuring continuous batching, tiered KV caching, and macOS menu bar management for efficient local AI.
Core Features
Optimized LLM inference for Apple Silicon (M-series chips).
Continuous batching and tiered KV caching (in-memory & SSD) for performance.
Managed via macOS menu bar and web-based Admin Dashboard.
Supports various AI models: LLMs, VLMs, OCR, embeddings, and rerankers.
Provides an OpenAI-compatible API for client integration.
Quick Start
brew install omlxDetailed Introduction
oMLX is an advanced LLM inference server specifically engineered for Apple Silicon Macs. It addresses the challenges of running large language models locally by implementing continuous batching and a sophisticated tiered KV caching system that spans both RAM and SSD. This ensures persistent context and efficient resource utilization, making local LLM usage practical for demanding tasks like coding. Managed conveniently from the macOS menu bar and an intuitive web dashboard, oMLX offers an OpenAI-compatible API, enabling seamless integration with existing AI tools and workflows.