Tags: #inference

Local AI Development Platform

62.9k

unslothai/unsloth

Unsloth Studio is a web UI that enables efficient local training and inference of open-source large language models and other AI models with significant VRAM and speed optimizations.

llm fine-tuning inference

Details

LLM Inference and Serving Engine

python

78.1k

vllm-project/vllm

vLLM is a high-throughput and memory-efficient open-source library designed for fast and easy serving of large language models.

llm inference serving

Details

AI Model Inference Serving Platform

Python

9.3k

xorbitsai/inference

A unified, production-ready inference API for deploying and serving open-source language, speech, and multimodal AI models on various infrastructures.

llm inference model-serving

Replaces:

OpenAI API

Details

AI/ML Developer Resource

python

18.3k

meta-llama/llama-cookbook

An official guide and collection of recipes for building applications with the Llama model family, covering inference, fine-tuning, and RAG.

llama llm ai

Details

AI Model Serving Tool

Podman

2.8k

containers/ramalama

RamaLama is an open-source developer tool that simplifies the local serving and production inference of AI models by leveraging familiar container technology.

ai-models containers inference

Details

AI/ML Deployment Toolkit

python

3.7k

PaddlePaddle/FastDeploy

A high-performance inference and deployment toolkit for Large Language Models (LLMs) and Vision-Language Models (VLMs) based on PaddlePaddle.

llm vlm inference

Details

LLM Inference Optimization Engine

vllm

8.1k

LMCache is an LLM serving engine extension designed to significantly reduce Time-To-First-Token (TTFT) and boost throughput by intelligently reusing KV caches across various storage tiers and serving instances.

llm kv-cache inference

Details

Technical Guide & Knowledge Base

cloud computing

17.8k

stas00/ml-engineering

An open collection of methodologies, tools, and step-by-step instructions for successful training, fine-tuning, and inference of large language and multi-modal models.

ml-engineering llm vlm

Details

Curated Resource List

3.8k

mnfst/awesome-free-llm-apis

A comprehensive list of Large Language Model (LLM) APIs offering permanent free tiers for text inference, including provider and third-party inference services.

free llm api ai inference

Details

AI Inference Engine

WebGPU

17.8k

mlc-ai/web-llm

A high-performance, in-browser LLM inference engine with OpenAI API compatibility, leveraging WebGPU for local, private AI.

llm in-browser webgpu

Replaces:

OpenAI API

Details

Curated Resource List

19.6k

cheahjs/free-llm-api-resources

A comprehensive list of free and trial-based LLM inference resources accessible via API.

llm api free-tier

Details

Serverless AI Runtime

Python

1.6k

beam-cloud/beta9

An ultrafast, open-source Pythonic runtime for deploying and scaling serverless GPU inference, sandboxes, and background jobs with zero infrastructure overhead.

serverless gpu ai

Replaces:

Celery

Details

AI/ML Inference Serving Framework

Hugging Face

4.6k

vllm-project/vllm-omni

A framework for efficient, fast, and cheap serving of omni-modality (text, image, video, audio) AI models.

multimodal inference serving

Details

AI/ML Inference Library

c++

2.1k

vitoplantamura/OnnxStream

A lightweight C++ inference library designed to run large ONNX-based AI models like Stable Diffusion XL and Mistral 7B on resource-constrained devices with minimal memory footprint.

onnx inference low-memory

Details

Tags: #inference

unslothai/unsloth

vllm-project/vllm

xorbitsai/inference

meta-llama/llama-cookbook

containers/ramalama

PaddlePaddle/FastDeploy

LMCache/LMCache

stas00/ml-engineering

mnfst/awesome-free-llm-apis

mlc-ai/web-llm

cheahjs/free-llm-api-resources

beam-cloud/beta9

vllm-project/vllm-omni

vitoplantamura/OnnxStream