Tags: #quantization

Curated Resource List
python
5.1k

xlite-dev/Awesome-LLM-Inference

A comprehensive, curated list of research papers and associated code for optimizing Large Language Model (LLM) and Vision Language Model (VLM) inference.

Deep Learning Optimization Library
Python
8.1k

bitsandbytes-foundation/bitsandbytes

A PyTorch library enabling accessible large language models by dramatically reducing memory consumption through k-bit quantization for both inference and training.

LLM Fine-tuning Framework
Python
2.7k

stochasticai/xTuring

xTuring simplifies the fine-tuning, evaluation, and deployment of open-source Large Language Models (LLMs) on private data, ensuring privacy and efficiency.

LLM Optimization Toolkit
huggingface
1.1k

ModelCloud/GPTQModel

A toolkit for quantizing (compressing) Large Language Models (LLMs) with hardware acceleration across various GPUs and CPUs, integrating with popular inference frameworks.

AI Inference Engine & Optimization Library
comfyui
3.8k

nunchaku-ai/nunchaku

Nunchaku is a high-performance inference engine that optimizes 4-bit neural networks, especially diffusion models, for speed and efficiency.

Deep Learning Library
pytorch
1.9k

kyegomez/BitNet

A PyTorch implementation of BitNet, enabling highly efficient 1-bit transformers for large language models.

AI Optimization Library
python
1.0k

intel/auto-round

AutoRound is an advanced quantization toolkit for Large Language Models (LLMs) and Vision-Language Models (VLMs), enabling high-accuracy, ultra-low-bit inference across diverse hardware.

ComfyUI Plugin
ComfyUI
2.8k

nunchaku-ai/ComfyUI-nunchaku

An efficient ComfyUI plugin for accelerated 4-bit neural network inference, leveraging Nunchaku and SVDQuant for enhanced performance in AI image generation workflows.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.