OSS Alternative - Discover Top Open Source Alternatives to Popular Software

jundot/omlx

An LLM inference server optimized for Apple Silicon, featuring continuous batching, tiered KV caching, and macOS menu bar management for efficient local AI.

Core Features

Optimized LLM inference for Apple Silicon (M-series chips).

Continuous batching and tiered KV caching (in-memory & SSD) for performance.

Managed via macOS menu bar and web-based Admin Dashboard.

Supports various AI models: LLMs, VLMs, OCR, embeddings, and rerankers.

Provides an OpenAI-compatible API for client integration.

Quick Start

brew install omlx

Detailed Introduction

oMLX is an advanced LLM inference server specifically engineered for Apple Silicon Macs. It addresses the challenges of running large language models locally by implementing continuous batching and a sophisticated tiered KV caching system that spans both RAM and SSD. This ensures persistent context and efficient resource utilization, making local LLM usage practical for demanding tasks like coding. Managed conveniently from the macOS menu bar and an intuitive web dashboard, oMLX offers an OpenAI-compatible API, enabling seamless integration with existing AI tools and workflows.