LLM Inference Server
9.9k 2026-04-14

jundot/omlx

An optimized LLM inference server for Apple Silicon, featuring continuous batching, tiered KV caching, and macOS menu bar management for efficient local AI.

Core Features

Optimized LLM inference for Apple Silicon (M-series chips).
Continuous batching and tiered KV caching (in-memory & SSD).
Convenient management via a macOS menu bar application.
Supports various AI models: LLMs, VLMs, OCR, embeddings, and rerankers.
OpenAI-compatible API and a web-based Admin Dashboard for monitoring and chat.

Quick Start

omlx serve --model-dir ~/models

Detailed Introduction

oMLX addresses the challenge of running large language models locally on macOS, offering both convenience and control. It provides an optimized inference server specifically for Apple Silicon, leveraging continuous batching and a unique tiered KV cache that spans RAM and SSD. This design ensures past context remains cached and reusable across requests, making local LLMs practical for demanding tasks like coding. Managed directly from the macOS menu bar, oMLX simplifies the deployment and management of various AI models, enhancing productivity for developers and users.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.