Tags: #llm-inference

LLM Inference and Serving Engine
python
76.3k

vllm-project/vllm

A high-throughput and memory-efficient open-source engine designed for fast, easy, and cost-effective serving of large language models.

Curated Resource List
python
5.1k

xlite-dev/Awesome-LLM-Inference

A comprehensive, curated list of research papers and associated code for optimizing Large Language Model (LLM) and Vision Language Model (VLM) inference.

LLM Inference Server
macOS
9.9k

jundot/omlx

An optimized LLM inference server for Apple Silicon, featuring continuous batching, tiered KV caching, and macOS menu bar management for efficient local AI.

Local LLM Inference Server
Rust
4.0k

Michael-A-Kuykendall/shimmy

A Python-free Rust-based inference server providing an OpenAI-compatible API for local GGUF and SafeTensors LLM models.

LLM Inference Optimization Library
Python
16.4k

lyogavin/airllm

AirLLM optimizes large language model inference memory, enabling 70B LLMs on a single 4GB GPU without quantization, and 405B Llama3.1 on 8GB VRAM.

LLM Inference Server
Docker
3.8k

predibase/lorax

A multi-LoRA inference server designed to efficiently serve thousands of fine-tuned Large Language Models on a single GPU, drastically cutting serving costs while maintaining high throughput and low latency.

CLI Tool / Local AI Inference Platform
llama.cpp
2.8k

janhq/cortex.cpp

A local AI API platform designed to run various AI models (vision, speech, language) on local hardware with an OpenAI-compatible API.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.