Ecosystem & Stack: cuda
kvcache-ai/Mooncake
A KVCache-centric disaggregated architecture for high-performance LLM serving, powering leading AI services.
tianrun-chen/SAM-Adapter-PyTorch
A PyTorch-based framework to adapt Meta AI's Segment Anything Model (SAM) for improved performance on challenging downstream computer vision tasks using adapters and prompts.
X-LANCE/SLAM-LLM
A deep learning toolkit for training custom multimodal large language models focused on speech, language, audio, and music processing.
lxe/simple-llm-finetuner
A beginner-friendly UI for fine-tuning language models using LoRA on commodity NVIDIA GPUs, though the project is no longer actively maintained.
mymusise/ChatGLM-Tuning
A cost-effective solution for fine-tuning ChatGLM-6B using LoRA, enabling personalized large language models.
predibase/lorax
A multi-LoRA inference server designed to efficiently serve thousands of fine-tuned Large Language Models on a single GPU, drastically cutting serving costs while maintaining high throughput and low latency.
huggingface/diffusers
A modular PyTorch library for state-of-the-art diffusion models, enabling easy generation of images, audio, and more.
Lightricks/ComfyUI-LTXVideo
Extends ComfyUI with advanced custom nodes for the LTX-2 video generation model, enabling powerful text-to-video and image-to-video workflows.
SamurAIGPT/AI-Youtube-Shorts-Generator
Automates YouTube Shorts generation from long videos using AI for highlights, subtitles, and vertical cropping.
denizsafak/abogen
Generate high-quality audiobooks and voiceovers from various text formats with synchronized captions.
RVC-Boss/GPT-SoVITS
A powerful open-source web UI for few-shot voice conversion and text-to-speech, enabling high-quality voice cloning with minimal audio data.
vllm-project/vllm-omni
vLLM-Omni is an efficient, flexible, and easy-to-use framework extending vLLM to serve omni-modality models (text, image, video, audio) with high throughput and an OpenAI-compatible API.
liucongg/ChatGLM-Finetuning
A comprehensive toolkit for fine-tuning ChatGLM-6B, ChatGLM2-6B, and ChatGLM3-6B models using various methods like Freeze, Lora, P-tuning, and full parameter fine-tuning.
abus-aikorea/voice-pro
A powerful AI-powered web application for comprehensive multimedia content creation, offering advanced speech recognition, voice cloning, multilingual TTS, and YouTube video processing.
jianchang512/ChatTTS-ui
Provides a local web interface and API for the ChatTTS model, enabling text-to-speech synthesis with support for mixed languages and numbers.
jianchang512/clone-voice
A user-friendly web-based tool for voice cloning, text-to-speech, and speech-to-speech conversion, leveraging the Coqui XTTS_v2 model with multi-language support.
MoonInTheRiver/DiffSinger
DiffSinger is an official PyTorch implementation of a singing voice synthesis (SVS) and text-to-speech (TTS) system, leveraging a shallow diffusion mechanism for high-quality audio generation.
Plachtaa/VALL-E-X
An open-source implementation of Microsoft's VALL-E X, enabling zero-shot multilingual text-to-speech synthesis and voice cloning with emotion control.
SciSharp/LLamaSharp
A cross-platform C#/.NET library for efficient local inference of large language models (LLMs) like LLaMA and LLAVA.
vladmandic/sdnext
An all-in-one open-source WebUI for AI generative image and video creation, captioning, and processing, built on Stable Diffusion.
FurkanGozukara/Stable-Diffusion
A comprehensive repository offering expert-level tutorials, guides, and courses on various Generative AI technologies, primarily focusing on Stable Diffusion and its ecosystem.
nateraw/stable-diffusion-videos
Create dynamic videos by smoothly transitioning between text prompts using Stable Diffusion's latent space exploration.
MrForExample/ComfyUI-3D-Pack
An extensive node suite that integrates cutting-edge 3D generation algorithms and models into ComfyUI, enabling seamless processing of 3D inputs like meshes and UV textures.