Tags: #llm-serving
AI Inference Framework
Python
25.7k
sgl-project/sglang
A high-performance serving framework designed to accelerate inference for large language models and multimodal AI models.
LLM Inference and Serving Engine
python
76.3k
vllm-project/vllm
A high-throughput and memory-efficient open-source engine designed for fast, easy, and cost-effective serving of large language models.
LLM Serving Platform
Python
5.1k
kvcache-ai/Mooncake
A KVCache-centric disaggregated architecture for high-performance LLM serving, powering leading AI services.
Educational Course
python
4.1k
skyzh/tiny-llm
An educational course for systems engineers to learn LLM inference serving on Apple Silicon by building a simplified vLLM-like system with MLX from scratch.
AI Service Framework
docker
21.9k
jina-ai/serve
A cloud-native framework for building, deploying, and scaling multimodal AI applications and services with gRPC, HTTP, and WebSockets.