Tags: #inference-optimization
AI Infrastructure Component
3.7k
vllm-project/semantic-router
A signal-driven intelligent router designed to optimize the efficiency, safety, and adaptability of multi-model AI systems across various environments.
LLM Inference Optimization Engine
GPU
8.0k
LMCache/LMCache
LMCache is an LLM serving engine extension designed to significantly reduce Time-To-First-Token (TTFT) and boost throughput, especially for long-context scenarios, by intelligently reusing KV caches.
AI Inference Proxy
python
3.4k
algorithmicsuperintelligence/optillm
OptiLLM is an OpenAI API-compatible inference proxy that significantly boosts LLM accuracy and performance on reasoning tasks using 20+ state-of-the-art optimization techniques, requiring zero training.
AI Optimization Library
python
1.0k
intel/auto-round
AutoRound is an advanced quantization toolkit for Large Language Models (LLMs) and Vision-Language Models (VLMs), enabling high-accuracy, ultra-low-bit inference across diverse hardware.