OSS Alternative - Discover Top Open Source Alternatives to Popular Software

GeeeekExplorer/nano-vllm

A lightweight and optimized Python library for fast offline large language model inference, offering comparable or better performance than vLLM with a more readable codebase.

Core Features

Fast offline LLM inference comparable to vLLM

Highly readable codebase (~1,200 lines of Python)

Comprehensive optimization suite (prefix caching, Tensor Parallelism, Torch compilation, CUDA graph)

vLLM-compatible API interface

Quick Start

pip install git+https://github.com/GeeeekExplorer/nano-vllm.git

Detailed Introduction

Nano-vLLM is an innovative, from-scratch implementation of a lightweight vLLM inference engine. It focuses on delivering high-speed offline inference for large language models, matching or exceeding the performance of the original vLLM while maintaining a significantly more concise and readable Python codebase (around 1,200 lines). The project integrates a comprehensive optimization suite, including prefix caching, tensor parallelism, and leveraging Torch compilation and CUDA graphs, making it an efficient solution for deploying LLMs on various hardware, particularly those with limited resources. Its vLLM-like API ensures ease of adoption for existing vLLM users.