xlite-dev/Awesome-LLM-Inference - OSS Alternative - Discover Top Open Source Alternatives to Popular Software
Curated Resource List
5.2k 2026-04-26

xlite-dev/Awesome-LLM-Inference

A comprehensive, curated list of research papers and associated code implementations focused on optimizing Large Language Model (LLM) and Vision-Language Model (VLM) inference.

Core Features

Extensive collection of LLM/VLM inference papers with code.
Covers advanced optimization techniques like Flash-Attention, Paged-Attention, and quantization.
Includes strategies for parallelism, KV cache management, and long context handling.
Provides a compiled PDF for offline reading and quick reference.
Regularly updated with the latest research and trending topics.

Quick Start

python3 download_pdfs.py

Detailed Introduction

Awesome-LLM-Inference serves as an invaluable central repository for researchers and developers keen on the latest advancements in Large Language Model (LLM) and Vision-Language Model (VLM) inference optimization. It meticulously curates academic papers alongside their corresponding code implementations, covering a wide spectrum of techniques from attention mechanisms and quantization to various parallelism strategies and KV cache optimizations. This project aims to streamline the discovery and application of cutting-edge methods to enhance the efficiency and performance of LLM/VLM deployment, making complex research accessible and actionable.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.