Audio Synthesis Framework
4.8k 2026-04-18

MoonInTheRiver/DiffSinger

DiffSinger is an official PyTorch implementation of a singing voice synthesis (SVS) and text-to-speech (TTS) system, leveraging a shallow diffusion mechanism for high-quality audio generation.

Core Features

High-quality Singing Voice Synthesis (SVS)
Advanced Text-to-Speech (TTS) capabilities
Utilizes a novel Shallow Diffusion Mechanism
Official implementation of an AAAI 2022 research paper
Supports various SVS pipelines including MIDI and F0 inputs

Detailed Introduction

DiffSinger is the official PyTorch implementation of the AAAI 2022 paper 'DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism'. It provides a robust framework for both singing voice synthesis (SVS) and text-to-speech (TTS), employing an innovative shallow diffusion model to generate high-fidelity audio. The project offers flexible pipelines, supporting inputs like lyrics, MIDI, and F0, and integrates with vocoders like HiFiGAN. It's a valuable resource for researchers and developers in AI audio generation.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.