OSS Alternative - Discover Top Open Source Alternatives to Popular Software

LMCache/LMCache

LMCache is an LLM serving engine extension designed to significantly reduce Time-To-First-Token (TTFT) and boost throughput by intelligently reusing KV caches across various storage tiers and serving instances.

Core Features

Optimized KV cache reuse for any text and serving engine instance.

Tiered storage support (GPU, CPU, Disk, S3) for KV caches.

Advanced acceleration techniques (zero CPU copy, NIXL, GDS).

Achieves 3-10x delay savings and GPU cycle reduction with vLLM.

Broad integration with LLM serving platforms and infrastructure providers.

Quick Start

pip install lmcache

Detailed Introduction

LMCache is a powerful extension for LLM serving engines, specifically engineered to tackle the challenges of high latency (TTFT) and low throughput, particularly in long-context scenarios. It achieves this by efficiently storing and reusing KV caches of previously processed texts across an entire datacenter, spanning GPU, CPU, Disk, and even S3. By employing advanced acceleration techniques, LMCache ensures that valuable GPU cycles are saved and user response times are drastically reduced, making LLM deployments more cost-effective and responsive. Its proven integration with frameworks like vLLM demonstrates substantial performance gains for various LLM use cases.