Tags: #visual-understanding
AI Model Framework
6.0k
om-ai-lab/VLM-R1
VLM-R1 is a stable and generalizable R1-style Large Vision-Language Model that leverages reinforcement learning to significantly improve visual understanding tasks.
Multimodal AI Model
17.7k
deepseek-ai/Janus
Janus-Series is a family of unified autoregressive multimodal AI models designed for both understanding and generating content across various modalities, featuring a novel decoupled visual encoding strategy.