Curated Resource List
5.1k 2026-04-13

xlite-dev/Awesome-LLM-Inference

A comprehensive, curated list of research papers and associated code for optimizing Large Language Model (LLM) and Vision Language Model (VLM) inference.

Core Features

Covers various LLM/VLM inference optimization techniques.
Includes papers on attention mechanisms like Flash-Attention and Paged-Attention.
Features research on quantization (WINT8/4, FP8, SmoothQuant) and parallelism strategies.
Explores KV Cache optimization, long context attention, and continuous batching.
Provides a downloadable PDF compilation of key papers for beginners.

Quick Start

python3 download_pdfs.py

Detailed Introduction

Awesome-LLM-Inference is an invaluable open-source repository meticulously curating a vast collection of research papers and their corresponding code implementations focused on enhancing the efficiency and performance of Large Language Model (LLM) and Vision Language Model (VLM) inference. It serves as a central hub for researchers and practitioners seeking to delve into advanced topics such as attention mechanisms, quantization, parallelism, and KV cache optimizations. The project aims to democratize access to cutting-edge advancements in LLM inference, offering a structured overview of the field's most impactful contributions.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.