LLM Serving Platform
5.1k 2026-04-13

kvcache-ai/Mooncake

A KVCache-centric disaggregated architecture for high-performance LLM serving, powering leading AI services.

Core Features

KVCache-centric disaggregated architecture for LLM serving
High-performance data transfer engine for cross-device/machine communication
Efficient management of hidden states for inference and training decoupling
Global multimodal embedding cache for cross-instance sharing
Seamless integration with PyTorch ecosystem and popular LLM frameworks

Detailed Introduction

Mooncake is an open-source, KVCache-centric disaggregated architecture designed for high-performance Large Language Model (LLM) serving. It powers leading LLM services like Kimi by Moonshot AI. The platform decouples compute-intensive components, enabling efficient cross-device and cross-machine data transfer through its Transfer Engine and Mooncake Store. It optimizes LLM inference by managing KVCache and hidden states, supporting multimodal scenarios, and integrating seamlessly with the PyTorch ecosystem and various LLM frameworks to enhance scalability and resource utilization in AI inference pipelines.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.