Tags: #gpu-acceleration
sgl-project/sglang
SGLang is a high-performance serving framework for large language models and multimodal models, optimizing inference throughput and latency.
Lightning-AI/pytorch-lightning
Streamlines complex deep learning engineering, enabling scalable AI model training and finetuning across diverse hardware with minimal code changes.
hpcaitech/ColossalAI
An open-source framework designed to make large AI model training and inference cheaper, faster, and more accessible through advanced distributed computing and memory optimization techniques.
NVIDIA-NeMo/Curator
A GPU-accelerated, scalable toolkit for multimodal data preprocessing and curation, designed to train better AI models faster.
ModelCloud/GPTQModel
A toolkit for quantizing (compressing) Large Language Models (LLMs) with hardware acceleration across various GPUs and CPUs, integrating with popular inference frameworks.
alibaba/ROLL
An efficient and user-friendly scaling library designed to optimize Reinforcement Learning with Large Language Models, enhancing performance in complex AI tasks.
withcatai/node-llama-cpp
A Node.js binding for llama.cpp, enabling local execution of large language models with advanced features like JSON schema enforcement and function calling.
edwko/OuteTTS
A versatile interface for OuteTTS models, providing flexible text-to-speech generation capabilities across various AI inference backends and hardware platforms.
janhq/cortex.cpp
A local AI API platform for running various AI models (vision, speech, language) on diverse hardware with an OpenAI-compatible API.
GeeeekExplorer/nano-vllm
A lightweight and optimized Python library for fast offline large language model inference, offering comparable or better performance than vLLM with a more readable codebase.
Lightning-AI/litgpt
A high-performance, no-abstraction toolkit providing recipes for pretraining, finetuning, and deploying over 20 large language models at scale.