AI Inference Platform
5.4k 2026-04-26
kserve/kserve
A standardized, scalable, multi-framework platform for deploying generative and predictive AI models on Kubernetes.
Core Features
Scalable Generative & Predictive AI Inference on Kubernetes.
Multi-framework support (TensorFlow, PyTorch, Hugging Face, etc.) with optimized backends for LLMs.
Advanced features like GPU acceleration, model caching, KV cache offloading, and intelligent routing.
Request-based autoscaling with scale-to-zero for cost efficiency.
Support for advanced deployments (canary, pipelines) and model explainability.
Detailed Introduction
KServe is a Cloud Native Computing Foundation (CNCF) incubating project that provides a standardized, distributed platform for deploying and serving both generative and predictive AI models on Kubernetes. It unifies AI inference, offering a simple yet powerful solution for enterprise-scale workloads. KServe supports multiple machine learning frameworks and includes advanced features like GPU acceleration, intelligent autoscaling, model caching, and advanced deployment strategies, making it a cost-efficient and robust choice for AI model serving.