Tags: #llm-inference - OSS Alternative - Discover Top Open Source Alternatives to Popular Software

Tags: #llm-inference

Educational Project / LLM Inference Serving Course
python
4.1k

skyzh/tiny-llm

A hands-on course for systems engineers to build an efficient LLM inference serving system from scratch on Apple Silicon using MLX, mimicking vLLM's core techniques.

Curated Resource List
Python
5.2k

xlite-dev/Awesome-LLM-Inference

A comprehensive, curated list of research papers and associated code implementations focused on optimizing Large Language Model (LLM) and Vision-Language Model (VLM) inference.

LLM Inference Server & Desktop Utility
macOS
11.7k

jundot/omlx

An LLM inference server optimized for Apple Silicon, featuring continuous batching, tiered KV caching, and macOS menu bar management for efficient local AI.

AI Inference Server
4.7k

Michael-A-Kuykendall/shimmy

Shimmy is a Python-free Rust inference server that provides a 100% OpenAI-compatible API for running local Large Language Models (LLMs) with zero dependencies.

LLM Inference Optimization Library
Python
17.0k

lyogavin/airllm

Optimizes large language model inference to run 70B models on a single 4GB GPU without quantization, enabling efficient deployment on resource-constrained hardware.

AI/ML Inference Server
Docker
3.8k

predibase/lorax

A multi-LoRA inference server designed to serve thousands of fine-tuned LLMs on a single GPU, significantly reducing serving costs while maintaining high throughput and low latency.

Local AI API Platform / CLI Tool
llama.cpp
2.8k

janhq/cortex.cpp

A local AI API platform for running various AI models (vision, speech, language) on diverse hardware with an OpenAI-compatible API.

Replaces:
Details
LLM Inference Engine
Python
13.1k

GeeeekExplorer/nano-vllm

A lightweight and optimized Python library for fast offline large language model inference, offering comparable or better performance than vLLM with a more readable codebase.

Hardware Plugin
vLLM
2.0k

vllm-project/vllm-ascend

A community-maintained hardware plugin that enables vLLM to run seamlessly and efficiently on Ascend NPUs for large language model inference.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.