Tags: #ai-inference
bentoml/BentoML
A Python library for building, deploying, and scaling AI/ML model inference APIs and serving systems.
kvcache-ai/Mooncake
A KVCache-centric disaggregated architecture for high-performance LLM serving, powering leading AI services.
gpustack/gpustack
An open-source GPU cluster manager that orchestrates high-performance AI inference engines like vLLM and SGLang for efficient model deployment across diverse environments.
kserve/kserve
A standardized, scalable, multi-framework platform for deploying generative and predictive AI models on Kubernetes.
openvinotoolkit/openvino
OpenVINO is an open-source toolkit designed to optimize and deploy deep learning models for efficient AI inference across a wide range of hardware platforms.
nunchaku-ai/nunchaku
Nunchaku is a high-performance AI inference engine that optimizes 4-bit neural networks, especially diffusion models, for faster and more memory-efficient execution.
collabora/WhisperLive
A highly optimized, nearly-live speech-to-text application leveraging OpenAI's Whisper model for real-time audio transcription.
edwko/OuteTTS
A versatile interface for OuteTTS models, providing flexible text-to-speech generation capabilities across various AI inference backends and hardware platforms.
nunchaku-ai/ComfyUI-nunchaku
A ComfyUI plugin that integrates Nunchaku, an efficient inference engine for 4-bit quantized neural networks, to accelerate AI model execution.