OSS Alternative - Discover Top Open Source Alternatives to Popular Software

xlite-dev/Awesome-LLM-Inference

A comprehensive, curated list of research papers and associated code implementations focused on optimizing Large Language Model (LLM) and Vision-Language Model (VLM) inference.

Core Features

Extensive collection of LLM/VLM inference papers with code.

Covers advanced optimization techniques like Flash-Attention, Paged-Attention, and quantization.

Includes strategies for parallelism, KV cache management, and long context handling.

Provides a compiled PDF for offline reading and quick reference.

Regularly updated with the latest research and trending topics.

Quick Start

python3 download_pdfs.py

Detailed Introduction

Awesome-LLM-Inference serves as an invaluable central repository for researchers and developers keen on the latest advancements in Large Language Model (LLM) and Vision-Language Model (VLM) inference optimization. It meticulously curates academic papers alongside their corresponding code implementations, covering a wide spectrum of techniques from attention mechanisms and quantization to various parallelism strategies and KV cache optimizations. This project aims to streamline the discovery and application of cutting-edge methods to enhance the efficiency and performance of LLM/VLM deployment, making complex research accessible and actionable.