Tags: #inference
unslothai/unsloth
Unsloth Studio provides a unified web interface for efficiently running and training open-source AI models locally across various operating systems and hardware.
xorbitsai/inference
A unified, production-ready inference API for effortlessly deploying and serving open-source language, speech, and multimodal AI models across various environments.
meta-llama/llama-cookbook
A comprehensive guide and collection of recipes for building with the Llama model family, covering inference, fine-tuning, RAG, and end-to-end use cases.
containers/ramalama
RamaLama simplifies the local serving and production inference of AI models from any source by leveraging familiar container patterns, eliminating complex host system configurations.
PaddlePaddle/FastDeploy
A high-performance inference and deployment toolkit for Large Language Models (LLMs) and Vision-Language Models (VLMs) based on PaddlePaddle.
bentoml/OpenLLM
Self-host and serve any open-source LLM as an OpenAI-compatible API endpoint with ease.
stas00/ml-engineering
An open collection of methodologies, tools, and step-by-step instructions for successfully training, fine-tuning, and inferencing large language and multi-modal models.
mnfst/awesome-free-llm-apis
A comprehensive, curated list of Large Language Model (LLM) APIs offering permanent free tiers for text inference.
mlc-ai/web-llm
A high-performance, in-browser LLM inference engine with OpenAI API compatibility, leveraging WebGPU for local, private AI.
beam-cloud/beta9
An ultrafast, open-source Pythonic runtime for deploying and scaling serverless GPU inference, sandboxes, and background jobs with zero infrastructure overhead.
nunchaku-ai/nunchaku
Nunchaku is a high-performance inference engine that optimizes 4-bit neural networks, especially diffusion models, for speed and efficiency.
NexaAI/nexa-sdk
A high-performance local inference framework for running frontier multimodal AI models on various devices with minimal energy consumption.