Tags: #large-vision-language-model
AI Model Framework
6.0k
om-ai-lab/VLM-R1
VLM-R1 is a stable and generalizable R1-style Large Vision-Language Model that leverages reinforcement learning to significantly improve visual understanding tasks.
Multimodal AI System
2.9k
InternLM/InternLM-XComposer
A comprehensive multimodal AI system specializing in long-term streaming video and audio interactions, offering advanced vision-language understanding and composition.