Ecosystem & Stack: pytorch
unslothai/unsloth
Unsloth Studio provides a unified web interface for efficiently running and training open-source AI models locally across various operating systems and hardware.
modelscope/ms-swift
A comprehensive framework from ModelScope for efficiently fine-tuning, evaluating, and deploying over 1000 large language models and multimodal large models using advanced techniques.
sgl-project/sglang
A high-performance serving framework designed to accelerate inference for large language models and multimodal AI models.
huggingface/peft
PEFT is a state-of-the-art library for Parameter-Efficient Fine-Tuning, drastically reducing the computational and storage costs of adapting large pretrained models.
Lightning-AI/pytorch-lightning
A deep learning framework that simplifies PyTorch development by automating boilerplate engineering code, enabling scalable training from CPU to multi-node GPUs with minimal code changes.
bentoml/BentoML
A Python library for building and deploying high-performance AI model inference APIs and multi-model serving systems with ease.
llmware-ai/llmware
A unified framework for building local, private, and secure enterprise RAG pipelines using small, specialized LLMs optimized for on-device and edge deployment.
rasbt/LLMs-from-scratch
A comprehensive, step-by-step guide and codebase for building a ChatGPT-like Large Language Model from scratch using PyTorch.
p-e-w/heretic
A tool for automatically removing censorship and safety alignment from transformer-based language models without expensive post-training.
adongwanai/AgentGuide
A comprehensive, job-oriented guide for AI Agent development, covering core technologies, practical projects, and interview preparation.
kvcache-ai/Mooncake
A KVCache-centric disaggregated architecture for high-performance LLM serving, powering leading AI services.
SwanHubX/SwanLab
An open-source, modern-design AI training tracking and visualization tool that integrates with 50+ mainstream frameworks, simplifying experiment management for AI teams.
stas00/ml-engineering
An open collection of methodologies, tools, and step-by-step instructions for successfully training, fine-tuning, and inferencing large language and multi-modal models.
activeloopai/deeplake
Deep Lake is an AI data runtime and database optimized for deep learning, offering multimodal data storage, querying, vector search, and streaming for LLM and deep learning applications.
bitsandbytes-foundation/bitsandbytes
A PyTorch library enabling accessible large language models by dramatically reducing memory consumption through k-bit quantization for both inference and training.
ludwig-ai/ludwig
Ludwig is a low-code, declarative deep learning framework designed to simplify the building, training, and deployment of custom AI models, including LLMs and neural networks.
openvinotoolkit/openvino
OpenVINO is an open-source toolkit designed to optimize and deploy deep learning models for efficient AI inference across diverse hardware platforms, from edge to cloud.
hpcaitech/ColossalAI
Colossal-AI makes training and deploying large AI models cheaper, faster, and more accessible through advanced distributed training techniques.
huggingface/datasets
A lightweight library providing a vast hub of ready-to-use datasets and efficient tools for data manipulation in AI and machine learning workflows.
towhee-io/towhee
Towhee is a cutting-edge framework designed to simplify and accelerate neural data processing pipelines, particularly for unstructured multimodal data and LLM orchestration.
docarray/docarray
A Python library for representing, transmitting, storing, and retrieving multimodal data, designed for AI applications.
tianrun-chen/SAM-Adapter-PyTorch
A PyTorch-based framework to adapt Meta AI's Segment Anything Model (SAM) for improved performance on challenging downstream computer vision tasks using adapters and prompts.
ModelCloud/GPTQModel
A toolkit for quantizing (compressing) Large Language Models (LLMs) with hardware acceleration across various GPUs and CPUs, integrating with popular inference frameworks.
X-LANCE/SLAM-LLM
A deep learning toolkit for training custom multimodal large language models focused on speech, language, audio, and music processing.
yangjianxin1/Firefly
Firefly is an open-source toolkit for efficient large language model training, supporting pre-training, instruction fine-tuning, and DPO with methods like LoRA and QLoRA.
zyds/transformers-code
A comprehensive code repository accompanying a hands-on course for mastering Huggingface Transformers, covering fundamental concepts to advanced fine-tuning and deployment techniques.
lxe/simple-llm-finetuner
A beginner-friendly UI for fine-tuning language models using LoRA on commodity NVIDIA GPUs, though the project is no longer actively maintained.
mymusise/ChatGLM-Tuning
A cost-effective solution for fine-tuning ChatGLM-6B using LoRA, enabling personalized large language models.
hiyouga/ChatGLM-Efficient-Tuning
An efficient toolkit for fine-tuning ChatGLM-6B models using PEFT methods, enabling customization and deployment of large language models.
datawhalechina/self-llm
A comprehensive Linux-based tutorial for deploying and fine-tuning open-source LLMs/MLLMs, tailored for Chinese beginners.
adapter-hub/adapters
A unified library for parameter-efficient and modular transfer learning, extending HuggingFace Transformers with various adapter methods.
lyogavin/airllm
AirLLM optimizes large language model inference memory, enabling 70B LLMs on a single 4GB GPU without quantization, and 405B Llama3.1 on 8GB VRAM.
labmlai/annotated_deep_learning_paper_implementations
A comprehensive collection of PyTorch implementations for over 60 deep learning papers, accompanied by detailed side-by-side notes for enhanced understanding.
camenduru/stable-diffusion-webui-colab
Provides Google Colab notebooks to easily run Stable Diffusion WebUI, including various models and extensions, though it is now outdated and superseded by TostUI.
microsoft/LoRA
A PyTorch library implementing LoRA (Low-Rank Adaptation) to efficiently fine-tune large language models by significantly reducing trainable parameters and storage requirements.
LianjiaTech/BELLE
BELLE is an open-source project dedicated to fostering the development of Chinese conversational large language models, aiming to make LLMs accessible to everyone.
cloneofsimo/lora
A tool for fast and efficient fine-tuning of diffusion models using Low-rank Adaptation (LoRA), producing small, shareable models.
wenge-research/YAYI
YaYi is an open-source Chinese large language model series, built on LLaMA 2 & BLOOM, designed to provide secure, reliable, and domain-specific AI capabilities for enterprise customers through extensive multi-domain instruction tuning.
zai-org/ImageReward
A human preference reward model for evaluating and improving text-to-image generation models.
RLHFlow/RLHF-Reward-Modeling
A comprehensive collection of recipes and code for training various reward models crucial for Reinforcement Learning from Human Feedback (RLHF) in large language models.
OpenLMLab/MOSS-RLHF
An open-source framework providing code, models, and insights for stable Reinforcement Learning from Human Feedback (RLHF) training in Large Language Models, focusing on the PPO algorithm and reward modeling.
huggingface/diffusers
A modular PyTorch library for state-of-the-art diffusion models, enabling easy generation of images, audio, and more.
Sanster/IOPaint
A free and open-source AI-powered tool for advanced image inpainting, outpainting, and object replacement using state-of-the-art models.
carson-katri/dream-textures
Integrates Stable Diffusion directly into Blender, enabling artists to generate textures, concept art, and 3D assets using text prompts without leaving their creative environment.
ashawkey/stable-dreamfusion
A PyTorch implementation of Dreamfusion, enabling text-to-3D and image-to-3D content generation using NeRF and Stable Diffusion.
kuprel/min-dalle
A fast, minimal PyTorch port of DALL·E Mini, optimized for efficient text-to-image generation inference.
lucidrains/imagen-pytorch
A PyTorch implementation of Google's Imagen, a state-of-the-art text-to-image neural network that surpasses DALL-E2 in synthesis quality.
lucidrains/DALLE2-pytorch
A PyTorch implementation of OpenAI's DALL-E 2, enabling advanced text-to-image synthesis through a diffusion-based neural network architecture.
lucidrains/DALLE-pytorch
An open-source PyTorch implementation of OpenAI's DALL-E, a text-to-image transformer, including CLIP for generation ranking.
NVIDIA-NeMo/NeMo
A scalable generative AI framework for building, customizing, and deploying models focused on Large Language Models, Multimodal, and Speech AI (ASR, TTS).
snakers4/silero-models
Silero Models offers a collection of pre-trained, end-to-end text-to-speech models designed for simplicity, speed, and natural-sounding speech generation.
fishaudio/Bert-VITS2
An open-source Text-to-Speech system built on the VITS2 backbone, enhanced with multilingual BERT for improved speech synthesis.
denizsafak/abogen
Generate high-quality audiobooks and voiceovers from various text formats with synchronized captions.
babysor/MockingBird
A powerful open-source toolkit for real-time voice cloning and arbitrary speech generation from text.
santinic/audiblez
Generate high-quality audiobooks in .m4b format from .epub e-books using advanced text-to-speech technology, with both command-line and graphical interfaces.
RVC-Boss/GPT-SoVITS
A powerful open-source web UI for few-shot voice conversion and text-to-speech, enabling high-quality voice cloning with minimal audio data.
remsky/Kokoro-FastAPI
A Dockerized FastAPI wrapper providing a high-performance, multi-platform (CPU/GPU) and multi-language API for the Kokoro-82M text-to-speech model, compatible with OpenAI's speech endpoint.
pixeltable/pixeltable
A declarative, transactional Python library for building multimodal AI applications with incremental data storage, transformation, indexing, and orchestration.
EvolvingLMMs-Lab/lmms-eval
A unified, reproducible, and efficient multimodal evaluation toolkit for large language models across text, image, video, and audio tasks.
OpenMOSS/MOSS-TTS
An open-source AI model family for high-fidelity, expressive speech and sound generation across diverse real-world applications.
kyegomez/BitNet
A PyTorch implementation of BitNet, enabling highly efficient 1-bit transformers for large language models.
facebookresearch/mmf
A modular PyTorch-based framework from Facebook AI Research for state-of-the-art vision and language multimodal AI research.
emcf/thepipe
A Python library for extracting clean markdown, multimodal media, and structured data from complex documents using vision-language models.
ZhaoJ9014/face.evoLVe
A high-performance, comprehensive face recognition library built on PaddlePaddle and PyTorch.
dbiir/UER-py
UER-py is an open-source PyTorch-based framework for pre-training and fine-tuning NLP models, offering modularity, extensibility, and a comprehensive model zoo.
liucongg/ChatGLM-Finetuning
A comprehensive toolkit for fine-tuning ChatGLM-6B, ChatGLM2-6B, and ChatGLM3-6B models using various methods like Freeze, Lora, P-tuning, and full parameter fine-tuning.
PhoebusSi/Alpaca-CoT
A unified platform simplifying instruction-tuning, parameter-efficient methods, and large language model integration for researchers and developers.
jianchang512/ChatTTS-ui
Provides a local web interface and API for the ChatTTS model, enabling text-to-speech synthesis with support for mixed languages and numbers.
MoonInTheRiver/DiffSinger
DiffSinger is an official PyTorch implementation of a singing voice synthesis (SVS) and text-to-speech (TTS) system, leveraging a shallow diffusion mechanism for high-quality audio generation.
coqui-ai/TTS
A deep learning toolkit for Text-to-Speech, offering pretrained models, training tools, and dataset utilities.
netease-youdao/EmotiVoice
An open-source, multi-voice, and prompt-controlled text-to-speech engine capable of generating speech with diverse emotions in English and Chinese.
yl4579/StyleTTS2
StyleTTS 2 is a text-to-speech model that achieves human-level speech synthesis by leveraging style diffusion and adversarial training with large speech language models.
metavoiceio/metavoice-src
MetaVoice-1B is an open-source, 1.2B parameter foundational model for highly expressive, human-like text-to-speech synthesis and zero-shot voice cloning.
Plachtaa/VALL-E-X
An open-source implementation of Microsoft's VALL-E X, enabling zero-shot multilingual text-to-speech synthesis and voice cloning with emotion control.
mozilla/TTS
A deep learning library for advanced Text-to-Speech generation, offering high-quality speech synthesis with pretrained models and multi-language support.
KohakuBlueleaf/LyCORIS
A library implementing various parameter-efficient fine-tuning (PEFT) algorithms, including advanced LoRA variants, for Stable Diffusion models to enhance image generation.
nateraw/stable-diffusion-videos
Create dynamic videos by smoothly transitioning between text prompts using Stable Diffusion's latent space exploration.
MrForExample/ComfyUI-3D-Pack
An extensive node suite that integrates cutting-edge 3D generation algorithms and models into ComfyUI, enabling seamless processing of 3D inputs like meshes and UV textures.
numz/ComfyUI-SeedVR2_VideoUpscaler
Official SeedVR2 Video Upscaler for ComfyUI, enabling high-quality video and image upscaling, also runnable as a standalone CLI.