Tags: #gpu-optimization
AI Development & Deployment Platform
pip
25.9k
modular/modular
A unified, open platform for accelerating AI model serving and scaling GenAI deployments with industry-leading performance across various hardware.
Large Language Model (LLM) Training Framework
6.6k
yangjianxin1/Firefly
Firefly is an open-source, all-in-one tool designed for efficient pre-training, instruction fine-tuning, and DPO of a wide range of mainstream large language models, optimized for resource-constrained environments.
LLM Inference Optimization Library
Python
17.0k
lyogavin/airllm
Optimizes large language model inference to run 70B models on a single 4GB GPU without quantization, enabling efficient deployment on resource-constrained hardware.
AI/ML Inference Server
Docker
3.8k
predibase/lorax
A multi-LoRA inference server designed to serve thousands of fine-tuned LLMs on a single GPU, significantly reducing serving costs while maintaining high throughput and low latency.