Tags: #quantization
axolotl-ai-cloud/axolotl
A free and open-source framework designed for efficient fine-tuning of large language models.
xlite-dev/Awesome-LLM-Inference
A comprehensive, curated list of research papers and associated code implementations focused on optimizing Large Language Model (LLM) and Vision-Language Model (VLM) inference.
bitsandbytes-foundation/bitsandbytes
A PyTorch library enabling accessible large language models through k-bit quantization, significantly reducing memory consumption for both inference and training.
stochasticai/xTuring
xTuring simplifies the process of fine-tuning and deploying open-source Large Language Models (LLMs) on private data, ensuring privacy, efficiency, and scalability.
ModelCloud/GPTQModel
A toolkit for quantizing (compressing) Large Language Models (LLMs) with hardware acceleration across various GPUs and CPUs, integrating with popular inference frameworks.
nunchaku-ai/nunchaku
Nunchaku is a high-performance AI inference engine that optimizes 4-bit neural networks, especially diffusion models, for faster and more memory-efficient execution.
kyegomez/BitNet
A PyTorch implementation of BitNet, enabling highly efficient 1-bit transformers for large language models.
intel/auto-round
AutoRound is an advanced quantization toolkit for Large Language Models (LLMs) and Vision-Language Models (VLMs), enabling high-accuracy, ultra-low-bit inference across diverse hardware.
nunchaku-ai/ComfyUI-nunchaku
A ComfyUI plugin that integrates Nunchaku, an efficient inference engine for 4-bit quantized neural networks, to accelerate AI model execution.