Tags: #model-serving
bentoml/BentoML
A Python library for building, deploying, and scaling AI/ML model inference APIs and serving systems.
ollama/ollama
Run open-source large language models locally on your machine with a simple CLI, REST API, and client libraries.
xorbitsai/inference
A unified, production-ready inference API for deploying and serving open-source language, speech, and multimodal AI models on various infrastructures.
modular/modular
A unified, open platform for accelerating AI model serving and scaling GenAI deployments with industry-leading performance across various hardware.
clearml/clearml
ClearML streamlines AI/ML/LLM workflows with integrated experiment tracking, data management, MLOps/LLMOps orchestration, and model serving.
SeldonIO/seldon-core
An MLOps and LLMOps framework for deploying, managing, and scaling AI systems, from singular models to complex data-centric applications, on Kubernetes.
predibase/lorax
A multi-LoRA inference server designed to serve thousands of fine-tuned LLMs on a single GPU, significantly reducing serving costs while maintaining high throughput and low latency.