Tags: #ai-inference
kvcache-ai/Mooncake
A KVCache-centric disaggregated architecture for high-performance LLM serving, powering leading AI services.
gpustack/gpustack
An open-source GPU cluster manager that orchestrates high-performance AI inference engines across diverse environments, optimizing model deployment and resource utilization.
kserve/kserve
KServe is a standardized, scalable, and multi-framework platform for deploying and serving both generative and predictive AI models on Kubernetes.
openvinotoolkit/openvino
OpenVINO is an open-source toolkit designed to optimize and deploy deep learning models for efficient AI inference across diverse hardware platforms, from edge to cloud.
cheahjs/free-llm-api-resources
A comprehensive, curated list of free and trial-based Large Language Model (LLM) inference APIs, detailing available models and their usage limits.
withcatai/node-llama-cpp
A Node.js library providing bindings for llama.cpp, enabling local AI model inference with advanced features like JSON schema enforcement and function calling.
edwko/OuteTTS
A versatile interface for OuteTTS models, providing flexible text-to-speech generation capabilities across various AI inference backends and hardware platforms.
Stability-AI/stability-sdk
A Python SDK and CLI for programmatic access to Stability AI's generative AI APIs, enabling image generation, upscaling, and animation.
leejet/stable-diffusion.cpp
A lightweight, pure C/C++ inference engine for various diffusion models, enabling efficient image and video generation across multiple platforms and hardware.
LykosAI/StabilityMatrix
A multi-platform package manager and inference UI designed to simplify the installation, updating, and management of various Stable Diffusion web UIs and related AI tools.
nunchaku-ai/ComfyUI-nunchaku
An efficient ComfyUI plugin for accelerated 4-bit neural network inference, leveraging Nunchaku and SVDQuant for enhanced performance in AI image generation workflows.