Tags: #inference

Local AI Studio
NVIDIA CUDA
61.3k

unslothai/unsloth

Unsloth Studio provides a unified web interface for efficiently running and training open-source AI models locally across various operating systems and hardware.

AI Model Serving Platform
Python
9.2k

xorbitsai/inference

A unified, production-ready inference API for effortlessly deploying and serving open-source language, speech, and multimodal AI models across various environments.

AI/ML Developer Resource Hub
python
18.3k

meta-llama/llama-cookbook

A comprehensive guide and collection of recipes for building with the Llama model family, covering inference, fine-tuning, RAG, and end-to-end use cases.

Developer Tool for AI Model Serving
docker
2.7k

containers/ramalama

RamaLama simplifies the local serving and production inference of AI models from any source by leveraging familiar container patterns, eliminating complex host system configurations.

Machine Learning Deployment Toolkit
paddlepaddle
3.7k

PaddlePaddle/FastDeploy

A high-performance inference and deployment toolkit for Large Language Models (LLMs) and Vision-Language Models (VLMs) based on PaddlePaddle.

LLM Serving Framework
Docker
12.3k

bentoml/OpenLLM

Self-host and serve any open-source LLM as an OpenAI-compatible API endpoint with ease.

Machine Learning Engineering Guide
slurm
17.7k

stas00/ml-engineering

An open collection of methodologies, tools, and step-by-step instructions for successfully training, fine-tuning, and inferencing large language and multi-modal models.

Curated Resource List
2.1k

mnfst/awesome-free-llm-apis

A comprehensive, curated list of Large Language Model (LLM) APIs offering permanent free tiers for text inference.

AI Inference Engine
WebGPU
17.8k

mlc-ai/web-llm

A high-performance, in-browser LLM inference engine with OpenAI API compatibility, leveraging WebGPU for local, private AI.

Serverless AI Runtime
Python
1.6k

beam-cloud/beta9

An ultrafast, open-source Pythonic runtime for deploying and scaling serverless GPU inference, sandboxes, and background jobs with zero infrastructure overhead.

Replaces:
Details
AI Inference Engine & Optimization Library
comfyui
3.8k

nunchaku-ai/nunchaku

Nunchaku is a high-performance inference engine that optimizes 4-bit neural networks, especially diffusion models, for speed and efficiency.

AI/ML Inference SDK
android
8.0k

NexaAI/nexa-sdk

A high-performance local inference framework for running frontier multimodal AI models on various devices with minimal energy consumption.

AI Inference Library
onnx
2.1k

vitoplantamura/OnnxStream

A lightweight C++ inference library for ONNX models, enabling low-memory execution of large AI models like Stable Diffusion XL and Mistral 7B on diverse hardware, from Raspberry Pi Zero 2 to servers.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.