Tags: #memory-efficiency
LLM Inference Optimization Library
Python
17.0k
lyogavin/airllm
Optimizes large language model inference to run 70B models on a single 4GB GPU without quantization, enabling efficient deployment on resource-constrained hardware.