Deep Learning Toolkit / Multimodal LLM Framework
1.0k 2026-04-18

X-LANCE/SLAM-LLM

A deep learning toolkit for training custom multimodal large language models focused on speech, language, audio, and music processing.

Core Features

Toolkit for custom multimodal LLM (MLLM) training.
Focus on speech, language, audio, and music processing.
Provides detailed training recipes and high-performance checkpoints.
Supports large-scale industrial training with features like DeepSpeed and dynamic batching.
Reproducibility for advanced MLLM applications like SLAM-Omni.

Detailed Introduction

SLAM-LLM is a robust deep learning toolkit engineered for researchers and developers to build and train custom multimodal large language models (MLLMs). It specializes in integrating and processing diverse data types, including speech, language, audio, and music. The framework offers comprehensive training recipes, high-performance inference checkpoints, and advanced features such as multi-task training, DeepSpeed integration, and dynamic frame batching, making it highly efficient for large-scale industrial applications. It also provides full reproducibility for cutting-edge MLLM systems like SLAM-Omni, fostering innovation in multimodal AI.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.