OSS Alternative - Discover Top Open Source Alternatives to Popular Software

PaddlePaddle/FastDeploy

A high-performance inference and deployment toolkit for Large Language Models (LLMs) and Vision-Language Models (VLMs) based on PaddlePaddle.

Core Features

Load-balanced PD decomposition for optimized resource utilization

Unified high-performance KV cache transfer

OpenAI API service and vLLM compatibility

Comprehensive quantization format support (W8A16, W4A16, FP8, etc.)

Advanced acceleration techniques like speculative decoding and multi-token prediction

Extensive multi-hardware support (NVIDIA, Kunlunxin, Hygon, etc.)

Detailed Introduction

FastDeploy is a production-grade inference and deployment toolkit built on PaddlePaddle, designed for Large Language Models (LLMs) and Vision-Language Models (VLMs). It offers out-of-the-box solutions with core features like load-balanced PD decomposition, unified KV cache transfer, and compatibility with OpenAI API services and vLLM. The toolkit supports various quantization formats and advanced acceleration techniques, ensuring high-performance and efficient deployment across a wide range of hardware platforms, including NVIDIA, Kunlunxin, and Hygon GPUs/XPUs.