LLM Serving Framework
12.3k 2026-04-26
bentoml/OpenLLM
A framework for easily self-hosting and serving any open-source Large Language Models as OpenAI-compatible API endpoints in the cloud.
Core Features
Run a wide range of open-source and custom LLMs.
Expose LLMs via OpenAI-compatible API endpoints.
Includes a built-in chat UI for interaction.
Supports enterprise-grade cloud deployment with Docker, Kubernetes, and BentoCloud.
Utilizes state-of-the-art inference backends for optimal performance.
Quick Start
pip install openllmDetailed Introduction
OpenLLM simplifies the complex process of deploying and managing Large Language Models (LLMs) by enabling developers to self-host any open-source or custom LLM and expose it through an OpenAI-compatible API. This provides flexibility and control over model deployment, reducing reliance on third-party services. With features like a built-in chat UI, advanced inference backends, and streamlined integration with Docker, Kubernetes, and BentoCloud, OpenLLM offers an efficient solution for creating scalable, enterprise-grade LLM applications in the cloud.