Tags: #serving

LLM Inference and Serving Engine

78.1k

vllm-project/vllm

vLLM is a high-throughput and memory-efficient open-source library designed for fast and easy serving of large language models.

llm inference serving

Details

LLM Inference Optimization Engine

vllm

8.1k

LMCache is an LLM serving engine extension designed to significantly reduce Time-To-First-Token (TTFT) and boost throughput by intelligently reusing KV caches across various storage tiers and serving instances.

llm kv-cache inference

Details

AI/ML Inference Serving Framework

Hugging Face

4.6k

vllm-project/vllm-omni

A framework for efficient, fast, and cheap serving of omni-modality (text, image, video, audio) AI models.

multimodal inference serving

Details

Tags: #serving

vllm-project/vllm

LMCache/LMCache

vllm-project/vllm-omni