OSS Alternative - Discover Top Open Source Alternatives to Popular Software

vllm-project/vllm

vLLM is a high-throughput and memory-efficient open-source library designed for fast and easy serving of large language models.

Core Features

State-of-the-art serving throughput with PagedAttention

Efficient memory management and continuous batching

Broad support for various quantization techniques (FP8, INT4, GPTQ/AWQ)

Seamless integration with 200+ Hugging Face models and architectures

Flexible distributed inference and OpenAI-compatible API server

Quick Start

uv pip install vllm

Detailed Introduction

vLLM, originating from UC Berkeley's Sky Computing Lab, is a leading open-source library for LLM inference and serving. It achieves state-of-the-art throughput and memory efficiency through innovations like PagedAttention, continuous batching, and advanced quantization. Designed for flexibility, vLLM integrates seamlessly with over 200 Hugging Face models, supports diverse hardware (NVIDIA, AMD, CPUs, TPUs), and offers features like distributed inference and an OpenAI-compatible API, making LLM deployment easy, fast, and cost-effective for a wide range of applications.