Tags: #visual-understanding

AI Model Framework

6.0k

om-ai-lab/VLM-R1

VLM-R1 is a stable and generalizable R1-style Large Vision-Language Model that leverages reinforcement learning to significantly improve visual understanding tasks.

large vision-language model reinforcement learning fine-tuning

Details

Multimodal AI Model

17.7k

Janus-Series is a family of unified autoregressive multimodal AI models designed for both understanding and generating content across various modalities, featuring a novel decoupled visual encoding strategy.

multimodal ai large language model image generation

Details

Tags: #visual-understanding

om-ai-lab/VLM-R1

deepseek-ai/Janus