Tags: #gpu-optimization
AI Development & Deployment Platform
pip
25.9k
modular/modular
A unified, open platform for accelerating AI model serving and scaling GenAI deployments with industry-leading performance across various hardware.
Large Language Model Training Tool
Hugging Face
6.7k
yangjianxin1/Firefly
Firefly is an open-source toolkit for efficient large language model training, supporting pre-training, instruction fine-tuning, and DPO with methods like LoRA and QLoRA.
LLM Inference Optimization Library
Python
16.4k
lyogavin/airllm
AirLLM optimizes large language model inference memory, enabling 70B LLMs on a single 4GB GPU without quantization, and 405B Llama3.1 on 8GB VRAM.
LLM Inference Server
Docker
3.8k
predibase/lorax
A multi-LoRA inference server designed to efficiently serve thousands of fine-tuned Large Language Models on a single GPU, drastically cutting serving costs while maintaining high throughput and low latency.