mostlygeek/llama-swap - OSS Alternative - Discover Top Open Source Alternatives to Popular Software
Local AI Model Management Proxy / API Gateway
3.6k 2026-04-26

mostlygeek/llama-swap

llama-swap enables seamless hot-swapping and management of multiple local generative AI models, acting as a unified API gateway compatible with OpenAI and Anthropic standards.

Core Features

Easy deployment with zero dependencies (single binary, config file).
On-demand switching between various local AI models.
Broad compatibility with OpenAI/Anthropic API servers (llama.cpp, vllm, etc.).
Comprehensive Web UI for monitoring, testing, and model control.
API Key support and customizable model loading/unloading.

Quick Start

docker run -it --rm --runtime nvidia -p 9292:8080 -v /path/to/models:/models -v /path/to/custom/config.yaml:/app/config.yaml ghcr.io/mostlygeek/llama-swap:cuda

Detailed Introduction

llama-swap is a high-performance, Go-built tool designed to streamline local generative AI workflows. It acts as a robust proxy, allowing users to run multiple AI models (like those powered by llama.cpp or vllm) on their machine and hot-swap between them on demand. By providing a unified API endpoint compatible with both OpenAI and Anthropic standards, it simplifies interaction with diverse local inference servers, offering features like a real-time web UI, API key management, and flexible model configuration, making local AI development and testing significantly more efficient.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.