Michael-A-Kuykendall/shimmy
Shimmy is a Python-free Rust inference server that provides a 100% OpenAI-compatible API for running local Large Language Models (LLMs) with zero dependencies.
Core Features
Quick Start
curl -L https://github.com/Michael-A-Kuykendall/shimmy/releases/latest/download/shimmy-linux-x86_64 -o shimmy && chmod +x shimmy && ./shimmy serve &Detailed Introduction
Shimmy offers a lightweight, dependency-free solution for running Large Language Models locally, acting as a drop-in replacement for the OpenAI API. Built with Rust, it compiles into a single binary, eliminating Python dependencies and simplifying deployment. It automatically discovers GGUF and SafeTensors models, provides hot model swapping, and supports advanced features like MOE for efficient execution of large models on consumer hardware. This enables developers to integrate local LLMs into existing OpenAI-compatible tools and applications with minimal configuration, ensuring privacy and cost-effectiveness.