OSS Alternative - Discover Top Open Source Alternatives to Popular Software

mostlygeek/llama-swap

llama-swap enables seamless hot-swapping and management of multiple local generative AI models, acting as a unified API gateway compatible with OpenAI and Anthropic standards.

Core Features

Easy deployment with zero dependencies (single binary, config file).

On-demand switching between various local AI models.

Broad compatibility with OpenAI/Anthropic API servers (llama.cpp, vllm, etc.).

Comprehensive Web UI for monitoring, testing, and model control.

API Key support and customizable model loading/unloading.

Quick Start

docker run -it --rm --runtime nvidia -p 9292:8080 -v /path/to/models:/models -v /path/to/custom/config.yaml:/app/config.yaml ghcr.io/mostlygeek/llama-swap:cuda

Detailed Introduction

llama-swap is a high-performance, Go-built tool designed to streamline local generative AI workflows. It acts as a robust proxy, allowing users to run multiple AI models (like those powered by llama.cpp or vllm) on their machine and hot-swap between them on demand. By providing a unified API endpoint compatible with both OpenAI and Anthropic standards, it simplifies interaction with diverse local inference servers, offering features like a real-time web UI, API key management, and flexible model configuration, making local AI development and testing significantly more efficient.