Tags: #quantization
xlite-dev/Awesome-LLM-Inference
A comprehensive, curated list of research papers and associated code for optimizing Large Language Model (LLM) and Vision Language Model (VLM) inference.
bitsandbytes-foundation/bitsandbytes
A PyTorch library enabling accessible large language models by dramatically reducing memory consumption through k-bit quantization for both inference and training.
stochasticai/xTuring
xTuring simplifies the fine-tuning, evaluation, and deployment of open-source Large Language Models (LLMs) on private data, ensuring privacy and efficiency.
ModelCloud/GPTQModel
A toolkit for quantizing (compressing) Large Language Models (LLMs) with hardware acceleration across various GPUs and CPUs, integrating with popular inference frameworks.
nunchaku-ai/nunchaku
Nunchaku is a high-performance inference engine that optimizes 4-bit neural networks, especially diffusion models, for speed and efficiency.
kyegomez/BitNet
A PyTorch implementation of BitNet, enabling highly efficient 1-bit transformers for large language models.
intel/auto-round
AutoRound is an advanced quantization toolkit for Large Language Models (LLMs) and Vision-Language Models (VLMs), enabling high-accuracy, ultra-low-bit inference across diverse hardware.
nunchaku-ai/ComfyUI-nunchaku
An efficient ComfyUI plugin for accelerated 4-bit neural network inference, leveraging Nunchaku and SVDQuant for enhanced performance in AI image generation workflows.