gpustack/gpustack
An open-source GPU cluster manager that orchestrates high-performance AI inference engines like vLLM and SGLang for efficient model deployment across diverse environments.
Core Features
Detailed Introduction
GPUStack is an open-source platform designed to streamline the deployment and management of AI models on GPU clusters. It provides robust multi-cluster GPU management capabilities, supporting on-premises, Kubernetes, and cloud environments. By orchestrating and optimizing inference engines like vLLM and SGLang, GPUStack ensures high-performance AI model serving with features like Day 0 model support, advanced caching, and speculative decoding. It also offers enterprise-grade operational features including automated recovery, load balancing, monitoring, and access control, enabling scalable Model-as-a-Service delivery.