AI/ML Serving Framework
26.4k 2026-04-25
sgl-project/sglang
SGLang is a high-performance serving framework for large language models and multimodal models, optimizing inference throughput and latency.
Core Features
High-performance LLM and multimodal model serving
Broad hardware support (NVIDIA, AMD, TPU)
Day-0 support for latest open models
Advanced optimizations (sparse attention, expert parallelism)
Accelerates video and image generation (SGLang Diffusion)
Quick Start
pip install sglangDetailed Introduction
SGLang is an advanced, high-performance serving framework designed to optimize the inference of large language models (LLMs) and multimodal models. It focuses on maximizing throughput and minimizing latency, leveraging cutting-edge techniques like sparse attention and expert parallelism. The framework offers broad hardware compatibility, supporting NVIDIA, AMD, and TPU platforms, and provides rapid integration for new open models. SGLang is crucial for deploying AI models at scale, enabling efficient and cost-effective serving for demanding applications.