Ecosystem & Stack: cuda
vllm-project/vllm
vLLM is a high-throughput and memory-efficient open-source library designed for fast and easy serving of large language models.
kvcache-ai/Mooncake
A KVCache-centric disaggregated architecture for high-performance LLM serving, powering leading AI services.
buxuku/SmartSub
A cross-platform desktop tool for batch generating and translating subtitles for videos and audio using various AI services and local models.
bitsandbytes-foundation/bitsandbytes
A PyTorch library enabling accessible large language models through k-bit quantization, significantly reducing memory consumption for both inference and training.
tianrun-chen/SAM-Adapter-PyTorch
A PyTorch-based framework to adapt Meta AI's Segment Anything Model (SAM) for improved performance on challenging downstream computer vision tasks using adapters and prompts.
X-LANCE/SLAM-LLM
A deep learning toolkit for training custom multimodal large language models focused on speech, language, audio, and music processing.
lxe/simple-llm-finetuner
A beginner-friendly UI for fine-tuning large language models (LLMs) using the LoRA method on commodity NVIDIA GPUs.
mymusise/ChatGLM-Tuning
A cost-effective solution for finetuning ChatGLM-6B with LoRA, enabling personalized large language models.
huggingface/diffusers
A modular PyTorch library for state-of-the-art diffusion models, enabling easy inference and training for image, video, and audio generation.
Sanster/IOPaint
An open-source, AI-driven tool for advanced image inpainting, outpainting, object removal, and replacement using state-of-the-art models.
Lightricks/ComfyUI-LTXVideo
Extends ComfyUI with advanced custom nodes for the LTX-2 video generation model, enabling powerful text-to-video and image-to-video workflows.
kuprel/min-dalle
A fast, minimal PyTorch port of DALL·E Mini for efficient text-to-image generation.
RVC-Boss/GPT-SoVITS
A powerful web-based tool for few-shot voice cloning and text-to-speech, enabling high-quality voice generation from minimal audio data.
vllm-project/vllm-omni
A framework for efficient, fast, and cheap serving of omni-modality (text, image, video, audio) AI models.
liucongg/ChatGLM-Finetuning
A toolkit for finetuning ChatGLM series models (ChatGLM-6B, ChatGLM2-6B, ChatGLM3-6B) using various methods like Freeze, Lora, P-tuning, and full parameter training for downstream NLP tasks.
abus-aikorea/voice-pro
An AI-powered web application for comprehensive multimedia content creation, offering speech recognition, voice cloning, text-to-speech, and multilingual translation.
jianchang512/ChatTTS-ui
Provides a local web interface and API for ChatTTS to synthesize text into speech, supporting mixed Chinese, English, and numbers.
MoonInTheRiver/DiffSinger
DiffSinger is an official PyTorch implementation of a singing voice synthesis (SVS) and text-to-speech (TTS) system, leveraging a shallow diffusion mechanism for high-quality audio generation.
Plachtaa/VALL-E-X
An open-source implementation of Microsoft's VALL-E X, enabling zero-shot multilingual text-to-speech synthesis and voice cloning with emotion control.
vladmandic/sdnext
SD.Next is a powerful, all-in-one open-source WebUI for AI generative image and video creation, offering extensive model support, advanced processing, and cross-platform compatibility.
nateraw/stable-diffusion-videos
Create dynamic and visually captivating videos by smoothly morphing between different text prompts using Stable Diffusion.
MrForExample/ComfyUI-3D-Pack
An extensive node suite that integrates advanced 3D input processing and asset generation into ComfyUI using cutting-edge AI algorithms and models.
kyegomez/OpenMythos
An open-source, theoretical reconstruction of the Claude Mythos LLM architecture, featuring a Recurrent-Depth Transformer and sparse Mixture of Experts for advanced reasoning.
BIT-DataLab/Edit-Banana
Edit Banana transforms static, uneditable content like images of diagrams into fully manipulatable and editable assets using advanced AI.