Tags: #gpu-optimization

AI Development & Deployment Platform

25.9k

modular/modular

A unified, open platform for accelerating AI model serving and scaling GenAI deployments with industry-leading performance across various hardware.

ai-development model-serving genai

Details

Large Language Model (LLM) Training Framework

6.6k

Firefly is an open-source, all-in-one tool designed for efficient pre-training, instruction fine-tuning, and DPO of a wide range of mainstream large language models, optimized for resource-constrained environments.

llm fine-tuning qlora

Details

LLM Inference Optimization Library

Python

17.0k

lyogavin/airllm

Optimizes large language model inference to run 70B models on a single 4GB GPU without quantization, enabling efficient deployment on resource-constrained hardware.

llm inference gpu optimization memory efficiency

Details

AI/ML Inference Server

Docker

3.8k

predibase/lorax

A multi-LoRA inference server designed to serve thousands of fine-tuned LLMs on a single GPU, significantly reducing serving costs while maintaining high throughput and low latency.

llm inference lora model serving

Details

Tags: #gpu-optimization

modular/modular

yangjianxin1/Firefly

lyogavin/airllm

predibase/lorax