OSS Alternative - Discover Top Open Source Alternatives to Popular Software

kvcache-ai/Mooncake

A KVCache-centric disaggregated architecture for high-performance LLM serving, powering leading AI services.

Core Features

KVCache-centric disaggregated architecture for LLM serving

High-performance data transfer engine for cross-device/machine communication

Efficient management of hidden states for inference and training decoupling

Global multimodal embedding cache for cross-instance sharing

Seamless integration with PyTorch ecosystem and popular LLM frameworks

Detailed Introduction

Mooncake is an open-source, KVCache-centric disaggregated architecture designed for high-performance Large Language Model (LLM) serving. It powers leading LLM services like Kimi by Moonshot AI. The platform decouples compute-intensive components, enabling efficient cross-device and cross-machine data transfer through its Transfer Engine and Mooncake Store. It optimizes LLM inference by managing KVCache and hidden states, supporting multimodal scenarios, and integrating seamlessly with the PyTorch ecosystem and various LLM frameworks to enhance scalability and resource utilization in AI inference pipelines.