Tags: #inference-optimization

AI Infrastructure Component
3.7k

vllm-project/semantic-router

A signal-driven intelligent router designed to optimize the efficiency, safety, and adaptability of multi-model AI systems across various environments.

LLM Inference Optimization Engine
GPU
8.0k

LMCache/LMCache

LMCache is an LLM serving engine extension designed to significantly reduce Time-To-First-Token (TTFT) and boost throughput, especially for long-context scenarios, by intelligently reusing KV caches.

AI Inference Proxy
python
3.4k

algorithmicsuperintelligence/optillm

OptiLLM is an OpenAI API-compatible inference proxy that significantly boosts LLM accuracy and performance on reasoning tasks using 20+ state-of-the-art optimization techniques, requiring zero training.

AI Optimization Library
python
1.0k

intel/auto-round

AutoRound is an advanced quantization toolkit for Large Language Models (LLMs) and Vision-Language Models (VLMs), enabling high-accuracy, ultra-low-bit inference across diverse hardware.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.