Ecosystem & Stack: cpu
LLM Inference Optimization Engine
GPU
8.0k
LMCache/LMCache
LMCache is an LLM serving engine extension designed to significantly reduce Time-To-First-Token (TTFT) and boost throughput, especially for long-context scenarios, by intelligently reusing KV caches.
LLM Fine-tuning Framework
Python
2.7k
stochasticai/xTuring
xTuring simplifies the fine-tuning, evaluation, and deployment of open-source Large Language Models (LLMs) on private data, ensuring privacy and efficiency.
AI/ML Inference SDK
android
8.0k
NexaAI/nexa-sdk
A high-performance local inference framework for running frontier multimodal AI models on various devices with minimal energy consumption.