Tags: #audio-processing
huggingface/transformers
A comprehensive library providing state-of-the-art pre-trained models for various machine learning tasks across text, vision, audio, and multimodal domains, facilitating both inference and training.
X-LANCE/SLAM-LLM
A deep learning toolkit for training custom multimodal large language models focused on speech, language, audio, and music processing.
IAHispano/Applio
A user-friendly, high-quality AI-powered tool for transforming voices with a focus on performance and customization.
Blaizzy/mlx-audio
An efficient audio processing library built on Apple's MLX framework, enabling fast text-to-speech, speech-to-text, and speech-to-speech capabilities on Apple Silicon devices.
collabora/WhisperLive
A highly optimized, nearly-live speech-to-text application leveraging OpenAI's Whisper model for real-time audio transcription.
jianchang512/clone-voice
A user-friendly, open-source tool that clones any human voice to generate speech from text or convert existing audio, featuring a web interface and multi-language support.
riffusion/riffusion-hobby
A library for real-time music and audio generation leveraging stable diffusion, offering CLI, interactive app, and API capabilities.