skypilot-org/skypilot
A unified system to run, manage, and scale AI workloads efficiently across any infrastructure, including Kubernetes, Slurm, and over 20 cloud providers, optimizing cost and resource availability.
Core Features
Quick Start
pip install -U "skypilot[kubernetes,aws,gcp,azure,oci,nebius,lambda,runpod,fluidstack,paperspace,cudo,ibm,scp,seeweb,shadeform,verda]"Detailed Introduction
SkyPilot is an open-source platform designed to streamline the execution, management, and scaling of AI workloads across a heterogeneous mix of infrastructure. It provides a single, intuitive interface for AI teams to deploy jobs on Kubernetes, Slurm, and over 20 cloud providers, while offering infrastructure teams a unified control plane for advanced scheduling and orchestration. By abstracting away infrastructure complexities, SkyPilot significantly reduces cloud costs through intelligent resource provisioning, spot instance utilization, and automatic cleanup, ensuring maximum GPU availability and operational efficiency for AI development and deployment.