vllm-project/vllm-omni
vLLM-Omni is an efficient, flexible, and easy-to-use framework extending vLLM to serve omni-modality models (text, image, video, audio) with high throughput and an OpenAI-compatible API.
Core Features
Detailed Introduction
vLLM-Omni extends the highly efficient vLLM framework to support inference and serving for omni-modality models, encompassing text, image, video, and audio data. It addresses the limitations of traditional LLM serving by integrating non-autoregressive architectures and enabling heterogeneous outputs. The framework prioritizes speed through advanced KV cache management and pipelined execution, while offering flexibility via heterogeneous pipeline abstraction and seamless integration with Hugging Face models. It aims to make omni-modality model serving accessible, fast, and cost-effective for everyone.