Tags: #multimodal-llm
Deep Learning Toolkit / Multimodal LLM Framework
linux
1.0k
X-LANCE/SLAM-LLM
A deep learning toolkit for training custom multimodal large language models focused on speech, language, audio, and music processing.
AI/ML Model
huggingface
2.4k
X-PLUG/mPLUG-DocOwl
A modularized multimodal large language model designed for OCR-free document understanding.
Multimodal Large Language Model
3.6k
NExT-GPT/NExT-GPT
The first end-to-end multimodal large language model (MM-LLM) that perceives input and generates output in arbitrary combinations (any-to-any) of text, image, video, and audio.