xlite-dev/Awesome-LLM-Inference
A comprehensive, curated list of research papers and associated code implementations focused on optimizing Large Language Model (LLM) and Vision-Language Model (VLM) inference.
Core Features
Quick Start
python3 download_pdfs.pyDetailed Introduction
Awesome-LLM-Inference serves as an invaluable central repository for researchers and developers keen on the latest advancements in Large Language Model (LLM) and Vision-Language Model (VLM) inference optimization. It meticulously curates academic papers alongside their corresponding code implementations, covering a wide spectrum of techniques from attention mechanisms and quantization to various parallelism strategies and KV cache optimizations. This project aims to streamline the discovery and application of cutting-edge methods to enhance the efficiency and performance of LLM/VLM deployment, making complex research accessible and actionable.