Generative AI Framework
17.1k 2026-04-18
NVIDIA-NeMo/NeMo
A scalable generative AI framework for building, customizing, and deploying models focused on Large Language Models, Multimodal, and Speech AI (ASR, TTS).
Core Features
Supports Automatic Speech Recognition (ASR), Text-to-Speech (TTS), and Speech LLMs.
Provides pre-trained models and checkpoints for rapid development.
Designed for efficient creation, customization, and deployment of AI models.
Scalable and built for researchers and PyTorch developers.
Focuses on audio, speech, and multimodal LLM applications.
Detailed Introduction
NVIDIA NeMo Speech is a robust, scalable generative AI framework tailored for researchers and PyTorch developers. It specializes in Large Language Models, Multimodal AI, and Speech AI, encompassing Automatic Speech Recognition (ASR) and Text-to-Speech (TTS). The framework empowers users to efficiently create, customize, and deploy cutting-edge AI models by leveraging a rich collection of existing code and pre-trained checkpoints, accelerating innovation in conversational AI and beyond.