Tags: #ai-inference

AI/ML Model Serving Framework

python

8.6k

bentoml/BentoML

A Python library for building, deploying, and scaling AI/ML model inference APIs and serving systems.

mlops model serving ai inference

Details

LLM Serving Platform

Python

5.2k

kvcache-ai/Mooncake

A KVCache-centric disaggregated architecture for high-performance LLM serving, powering leading AI services.

llm serving kvcache disaggregated architecture

Details

GPU Cluster Management Platform

docker

4.9k

gpustack/gpustack

An open-source GPU cluster manager that orchestrates high-performance AI inference engines like vLLM and SGLang for efficient model deployment across diverse environments.

gpu-management ai-inference llm-deployment

Details

AI Inference Platform

kubernetes

5.4k

kserve/kserve

A standardized, scalable, multi-framework platform for deploying generative and predictive AI models on Kubernetes.

kubernetes ai inference generative ai

Details

AI Inference and Deployment Toolkit

python

10.1k

openvinotoolkit/openvino

OpenVINO is an open-source toolkit designed to optimize and deploy deep learning models for efficient AI inference across a wide range of hardware platforms.

ai inference deep learning model optimization

Details

AI Inference Engine

3.8k

nunchaku-ai/nunchaku

Nunchaku is a high-performance AI inference engine that optimizes 4-bit neural networks, especially diffusion models, for faster and more memory-efficient execution.

ai-inference quantization diffusion-models

Details

Real-time Speech-to-Text Application

python

4.0k

collabora/WhisperLive

A highly optimized, nearly-live speech-to-text application leveraging OpenAI's Whisper model for real-time audio transcription.

speech-to-text real-time transcription

Replaces:

Commercial Real-time Transcription Services OpenAI Whisper API (for real-time applications)

Details

AI/ML Library & SDK

Python

1.4k

edwko/OuteTTS

A versatile interface for OuteTTS models, providing flexible text-to-speech generation capabilities across various AI inference backends and hardware platforms.

text-to-speech ai-inference python-library

Replaces:

Google Cloud Text-to-Speech Amazon Polly...

Details

ComfyUI Plugin

comfyui

2.9k

nunchaku-ai/ComfyUI-nunchaku

A ComfyUI plugin that integrates Nunchaku, an efficient inference engine for 4-bit quantized neural networks, to accelerate AI model execution.

comfyui ai-inference quantization

Details