Tags: #large-vision-language-model

AI Model Framework

6.0k

om-ai-lab/VLM-R1

VLM-R1 is a stable and generalizable R1-style Large Vision-Language Model that leverages reinforcement learning to significantly improve visual understanding tasks.

large vision-language model reinforcement learning fine-tuning

Details

Multimodal AI System

2.9k

InternLM/InternLM-XComposer

A comprehensive multimodal AI system specializing in long-term streaming video and audio interactions, offering advanced vision-language understanding and composition.

multimodal ai large vision language model video understanding

Replaces:

GPT-4V

Details