Ecosystem & Stack: vllm
LazyAGI/LazyLLM
LazyLLM simplifies the creation and iterative optimization of multi-agent large language model (LLM) applications with a low-code approach.
LMCache/LMCache
LMCache is an LLM serving engine extension designed to significantly reduce Time-To-First-Token (TTFT) and boost throughput by intelligently reusing KV caches across various storage tiers and serving instances.
OpenRLHF/OpenRLHF
An easy-to-use, scalable, and high-performance open-source framework for Reinforcement Learning from Human Feedback (RLHF), leveraging Ray and vLLM for distributed training of LLMs and VLMs.
katanaml/sparrow
A production-ready platform for structured data extraction and instruction calling using ML, LLM, and Vision LLM technologies.
mostlygeek/llama-swap
llama-swap enables seamless hot-swapping and management of multiple local generative AI models, acting as a unified API gateway compatible with OpenAI and Anthropic standards.
tencentmusic/cube-studio
An open-source, cloud-native, all-in-one MLOps platform designed for the full lifecycle management of machine learning, deep learning, and large language model development and deployment.
ludwig-ai/ludwig
A low-code, declarative framework for building and deploying custom large language models (LLMs) and other deep neural networks with ease and efficiency.
microsoft/fara
An ultra-compact 7B parameter AI agent designed by Microsoft to automate multi-step computer tasks through visual perception and direct interface interaction.
oumi-ai/oumi
An end-to-end platform for fine-tuning, evaluating, and deploying open-source Large Language Models (LLMs) and Vision Language Models (VLMs).
bespokelabsai/curator
A Python library for generating and curating high-quality synthetic data for AI model training and structured data extraction.
FunAudioLLM/CosyVoice
CosyVoice is an advanced multi-lingual large language model-based text-to-speech system offering state-of-the-art voice generation, cloning, and full-stack deployment capabilities.
ModelCloud/GPTQModel
A toolkit for quantizing (compressing) Large Language Models (LLMs) with hardware acceleration across various GPUs and CPUs, integrating with popular inference frameworks.
alibaba/ROLL
An efficient and user-friendly scaling library designed to optimize Reinforcement Learning with Large Language Models, enhancing performance in complex AI tasks.
PKU-Alignment/align-anything
A modular framework for aligning any-modality large models with human intentions and values using various fine-tuning and reinforcement learning methods.
ymcui/Chinese-LLaMA-Alpaca-2
An open-source project providing Chinese LLaMA-2 and Alpaca-2 large language models with expanded Chinese vocabulary, enhanced capabilities, and support for ultra-long contexts up to 64K.
2noise/ChatTTS
A generative speech model optimized for natural, expressive dialogue in LLM assistants, featuring fine-grained prosodic control.
edwko/OuteTTS
A versatile interface for OuteTTS models, providing flexible text-to-speech generation capabilities across various AI inference backends and hardware platforms.
canopyai/Orpheus-TTS
Orpheus TTS is a state-of-the-art open-source text-to-speech system built on a Llama-3b backbone, aiming to generate human-sounding, emotionally rich speech with low latency.
vllm-project/vllm-ascend
A community-maintained hardware plugin that enables vLLM to run seamlessly and efficiently on Ascend NPUs for large language model inference.