LLM Serving Platform
5.1k 2026-04-13
kvcache-ai/Mooncake
A KVCache-centric disaggregated architecture for high-performance LLM serving, powering leading AI services.
Core Features
KVCache-centric disaggregated architecture for LLM serving
High-performance data transfer engine for cross-device/machine communication
Efficient management of hidden states for inference and training decoupling
Global multimodal embedding cache for cross-instance sharing
Seamless integration with PyTorch ecosystem and popular LLM frameworks
Detailed Introduction
Mooncake is an open-source, KVCache-centric disaggregated architecture designed for high-performance Large Language Model (LLM) serving. It powers leading LLM services like Kimi by Moonshot AI. The platform decouples compute-intensive components, enabling efficient cross-device and cross-machine data transfer through its Transfer Engine and Mooncake Store. It optimizes LLM inference by managing KVCache and hidden states, supporting multimodal scenarios, and integrating seamlessly with the PyTorch ecosystem and various LLM frameworks to enhance scalability and resource utilization in AI inference pipelines.