Tags: #serving
LLM Inference and Serving Engine
python
78.1k
vllm-project/vllm
vLLM is a high-throughput and memory-efficient open-source library designed for fast and easy serving of large language models.
LLM Inference Optimization Engine
vllm
8.1k
LMCache/LMCache
LMCache is an LLM serving engine extension designed to significantly reduce Time-To-First-Token (TTFT) and boost throughput by intelligently reusing KV caches across various storage tiers and serving instances.
AI/ML Inference Serving Framework
Hugging Face
4.6k
vllm-project/vllm-omni
A framework for efficient, fast, and cheap serving of omni-modality (text, image, video, audio) AI models.