Local AI Model Management Proxy
3.2k 2026-04-13
mostlygeek/llama-swap
A high-performance Go-based proxy for hot-swapping and managing multiple local generative AI models compatible with OpenAI and Anthropic APIs.
Core Features
On-demand hot-swapping of local AI models.
Compatibility with various local inference servers like llama.cpp, vllm, and stable-diffusion.cpp.
Supports a wide range of OpenAI and Anthropic API endpoints.
Includes a real-time web UI for model testing, monitoring, and management.
Easy deployment with a single binary and configuration file, no external dependencies.
Quick Start
docker pull ghcr.io/mostlygeek/llama-swap:cudaDetailed Introduction
llama-swap is an efficient Go-based utility designed to streamline the management and utilization of local generative AI models. It acts as a reliable proxy, enabling users to run multiple AI models on their machine and hot-swap between them on demand. By supporting OpenAI and Anthropic API compatible servers, it offers a future-proof solution for local AI workflows, abstracting away the underlying inference engine. Its simple deployment, zero dependencies, and comprehensive web UI make it an accessible tool for developers and enthusiasts looking to leverage local AI capabilities.