vLLM is a high-throughput and memory-efficient open-source library designed for fast and easy serving of large language models.