AI Inference Framework
25.7k 2026-04-13
sgl-project/sglang
A high-performance serving framework designed to accelerate inference for large language models and multimodal AI models.
Core Features
Achieves significant inference performance gains (e.g., 25x on NVIDIA GB300).
Provides day-0 support for the latest open-source LLMs and multimodal models.
Supports diverse hardware platforms including NVIDIA, AMD, and TPUs.
Accelerates generation for video and image models (SGLang Diffusion).
Enables large-scale deployment with features like Expert Parallelism and PD Disaggregation.
Quick Start
pip install sglangDetailed Introduction
SGLang is a cutting-edge, high-performance serving framework engineered to optimize the inference process for large language models and multimodal AI. It addresses the critical need for efficient and scalable AI deployment by offering significant speedups, broad hardware compatibility (NVIDIA, AMD, TPU), and rapid integration of new open models. SGLang empowers developers to deploy complex AI applications, including advanced text, image, and video generation, with unparalleled throughput and reduced latency, making it an essential tool for modern AI infrastructure, driving innovation in the field.