Local LLM Inference Server
4.0k 2026-04-18

Michael-A-Kuykendall/shimmy

A Python-free Rust-based inference server providing an OpenAI-compatible API for local GGUF and SafeTensors LLM models.

Core Features

100% OpenAI API compatible endpoints for local LLMs.
Single-binary, Python-free, Rust-built for lightweight and dependency-free operation.
Automatic model discovery (Hugging Face, Ollama, local dirs) and hot model swap.
Advanced MOE (Mixture of Experts) support for large models on consumer hardware.
Includes all GPU backends in a single download, no compilation needed.

Quick Start

curl -L https://github.com/Michael-A-Kuykendall/shimmy/releases/latest/download/shimmy-linux-x86_64 -o shimmy && chmod +x shimmy && ./shimmy serve &

Detailed Introduction

Shimmy is a high-performance, dependency-free inference server built in Rust, designed to bring the power of large language models (LLMs) to local environments. It offers a 100% OpenAI API-compatible endpoint, allowing developers to seamlessly integrate local GGUF and SafeTensors models into existing AI tools and SDKs without code changes. Its single-binary distribution, automatic configuration, and advanced MOE support make it an ideal solution for private, efficient, and scalable local AI inference, enabling the use of powerful models even on consumer-grade hardware.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.