Tags: #inference-optimization
AI Inference Optimization Proxy
Python
3.5k
algorithmicsuperintelligence/optillm
OptiLLM is an OpenAI API-compatible inference proxy that uses 20+ state-of-the-art techniques to significantly boost LLM accuracy and performance on reasoning tasks without requiring any training.
AI Optimization Library
python
1.0k
intel/auto-round
AutoRound is an advanced quantization toolkit for Large Language Models (LLMs) and Vision-Language Models (VLMs), enabling high-accuracy, ultra-low-bit inference across diverse hardware.