Tags: #llm-serving
AI/ML Serving Framework
NVIDIA GPUs
26.4k
sgl-project/sglang
SGLang is a high-performance serving framework for large language models and multimodal models, optimizing inference throughput and latency.
LLM Serving Platform
Python
5.2k
kvcache-ai/Mooncake
A KVCache-centric disaggregated architecture for high-performance LLM serving, powering leading AI services.
LLM Serving Framework
Docker
12.3k
bentoml/OpenLLM
A framework for easily self-hosting and serving any open-source Large Language Models as OpenAI-compatible API endpoints in the cloud.
Replaces:
Details AI Service Framework
Docker
21.9k
jina-ai/serve
A cloud-native framework for building and deploying high-performance multimodal AI applications with built-in scaling and orchestration.