Tags: #model-serving

AI/ML Model Serving Framework

python

8.6k

bentoml/BentoML

A Python library for building, deploying, and scaling AI/ML model inference APIs and serving systems.

mlops model serving ai inference

Details

Local AI Model Server

Docker

170.0k

ollama/ollama

Run open-source large language models locally on your machine with a simple CLI, REST API, and client libraries.

llm local-ai model-serving

Details

AI Model Inference Serving Platform

Python

9.3k

xorbitsai/inference

A unified, production-ready inference API for deploying and serving open-source language, speech, and multimodal AI models on various infrastructures.

llm inference model-serving

Replaces:

OpenAI API

Details

AI Development & Deployment Platform

pip

25.9k

modular/modular

A unified, open platform for accelerating AI model serving and scaling GenAI deployments with industry-leading performance across various hardware.

ai-development model-serving genai

Details

MLOps/LLMOps Platform

python

6.6k

clearml/clearml

ClearML streamlines AI/ML/LLM workflows with integrated experiment tracking, data management, MLOps/LLMOps orchestration, and model serving.

mlops llmops experiment-tracking

Replaces:

AWS SageMaker Azure Machine Learning...

Details

MLOps Platform

Kubernetes

4.7k

SeldonIO/seldon-core

An MLOps and LLMOps framework for deploying, managing, and scaling AI systems, from singular models to complex data-centric applications, on Kubernetes.

mlops llmops kubernetes

Details

AI/ML Inference Server

Docker

3.8k

predibase/lorax

A multi-LoRA inference server designed to serve thousands of fine-tuned LLMs on a single GPU, significantly reducing serving costs while maintaining high throughput and low latency.

llm inference lora model serving

Details