Tags: #inference-optimization

AI Inference Optimization Proxy

3.5k

algorithmicsuperintelligence/optillm

OptiLLM is an OpenAI API-compatible inference proxy that uses 20+ state-of-the-art techniques to significantly boost LLM accuracy and performance on reasoning tasks without requiring any training.

llm inference-optimization ai-proxy

Details

AI Optimization Library

python

1.0k

intel/auto-round

AutoRound is an advanced quantization toolkit for Large Language Models (LLMs) and Vision-Language Models (VLMs), enabling high-accuracy, ultra-low-bit inference across diverse hardware.

llm quantization deep-learning

Details