Ecosystem & Stack: cuda

A PyTorch library enabling accessible large language models through k-bit quantization, significantly reducing memory consumption for both inference and training.

pytorch quantization llm

Details

Deep Learning Adaptation Framework

Python

1.5k

tianrun-chen/SAM-Adapter-PyTorch

A PyTorch-based framework to adapt Meta AI's Segment Anything Model (SAM) for improved performance on challenging downstream computer vision tasks using adapters and prompts.

segmentation computer-vision pytorch

Details

Deep Learning Toolkit / Multimodal LLM Framework

linux

1.0k

X-LANCE/SLAM-LLM

A deep learning toolkit for training custom multimodal large language models focused on speech, language, audio, and music processing.

multimodal-llm deep-learning speech-processing

Details

LLM Finetuning UI Tool

python

2.1k

lxe/simple-llm-finetuner

A beginner-friendly UI for fine-tuning large language models (LLMs) using the LoRA method on commodity NVIDIA GPUs.

llm-finetuning lora peft

Details

Large Language Model Finetuning Solution

Python

3.8k

mymusise/ChatGLM-Tuning

A cost-effective solution for finetuning ChatGLM-6B with LoRA, enabling personalized large language models.

llm finetuning lora

Replaces:

ChatGPT

Details

Machine Learning Library

PyTorch

33.5k

huggingface/diffusers

A modular PyTorch library for state-of-the-art diffusion models, enabling easy inference and training for image, video, and audio generation.

diffusion models generative ai pytorch

Details

AI-Powered Image Editing Tool

pytorch

23.0k

Sanster/IOPaint

An open-source, AI-driven tool for advanced image inpainting, outpainting, object removal, and replacement using state-of-the-art models.

ai image-editing inpainting

Replaces:

Photoshop

Details

ComfyUI Custom Nodes / AI Video Generation Plugin

comfyui

3.5k

Lightricks/ComfyUI-LTXVideo

Extends ComfyUI with advanced custom nodes for the LTX-2 video generation model, enabling powerful text-to-video and image-to-video workflows.

comfyui video-generation ai-model

Details

Text-to-Image Generation Library

pytorch

3.5k

kuprel/min-dalle

A fast, minimal PyTorch port of DALL·E Mini for efficient text-to-image generation.

pytorch text-to-image ai

Details

AI Voice Synthesis WebUI

Python

57.1k

RVC-Boss/GPT-SoVITS

A powerful web-based tool for few-shot voice cloning and text-to-speech, enabling high-quality voice generation from minimal audio data.

text-to-speech voice-cloning few-shot-learning

Replaces:

Commercial Text-to-Speech Services Voice Cloning Software

Details

AI/ML Inference Serving Framework

Hugging Face

4.6k

vllm-project/vllm-omni

A framework for efficient, fast, and cheap serving of omni-modality (text, image, video, audio) AI models.

multimodal inference serving

Details

Large Language Model Finetuning Toolkit

DeepSpeed

2.8k

liucongg/ChatGLM-Finetuning

A toolkit for finetuning ChatGLM series models (ChatGLM-6B, ChatGLM2-6B, ChatGLM3-6B) using various methods like Freeze, Lora, P-tuning, and full parameter training for downstream NLP tasks.

chatglm finetuning llm

Details

AI Multimedia Processing Web Application

Gradio

7.2k

abus-aikorea/voice-pro

An AI-powered web application for comprehensive multimedia content creation, offering speech recognition, voice cloning, text-to-speech, and multilingual translation.

ai voice speech recognition text-to-speech

Replaces:

ElevenLabs

Details

Text-to-Speech Web Interface

Python

7.5k

jianchang512/ChatTTS-ui

Provides a local web interface and API for ChatTTS to synthesize text into speech, supporting mixed Chinese, English, and numbers.

chattts text-to-speech web-ui

Details

Audio Synthesis Framework

Python

4.8k

MoonInTheRiver/DiffSinger

DiffSinger is an official PyTorch implementation of a singing voice synthesis (SVS) and text-to-speech (TTS) system, leveraging a shallow diffusion mechanism for high-quality audio generation.

singing-voice-synthesis text-to-speech diffusion-models

Details

AI Model Implementation / Text-to-Speech System

Python

7.9k

Plachtaa/VALL-E-X

An open-source implementation of Microsoft's VALL-E X, enabling zero-shot multilingual text-to-speech synthesis and voice cloning with emotion control.

tts voice-cloning multilingual

Replaces:

ElevenLabs Commercial Text-to-Speech (TTS) services

Details

AI Generative Art WebUI

Docker

7.1k

vladmandic/sdnext

SD.Next is a powerful, all-in-one open-source WebUI for AI generative image and video creation, offering extensive model support, advanced processing, and cross-platform compatibility.

ai image generation stable diffusion webui

Details

AI Video Generation Tool

Python

4.7k

nateraw/stable-diffusion-videos

Create dynamic and visually captivating videos by smoothly morphing between different text prompts using Stable Diffusion.

stable-diffusion ai-video-generation text-to-video

Details

AI 3D Asset Generation Node Suite

ComfyUI

3.7k

MrForExample/ComfyUI-3D-Pack

An extensive node suite that integrates advanced 3D input processing and asset generation into ComfyUI using cutting-edge AI algorithms and models.

comfyui 3d generation nerf

Details

Deep Learning Library / LLM Architecture Implementation

PyTorch

12.1k

kyegomez/OpenMythos

An open-source, theoretical reconstruction of the Claude Mythos LLM architecture, featuring a Recurrent-Depth Transformer and sparse Mixture of Experts for advanced reasoning.

llm architecture recurrent transformer mixture of experts

Details

AI-powered Content Transformation Tool

python

5.1k

BIT-DataLab/Edit-Banana

Edit Banana transforms static, uneditable content like images of diagrams into fully manipulatable and editable assets using advanced AI.

ai multimodal-ai diagram-editor

Replaces:

Microsoft Visio Lucidchart

Details