OSS Alternative - Discover Top Open Source Alternatives to Popular Software

om-ai-lab/VLM-R1

VLM-R1 is a stable and generalizable R1-style Large Vision-Language Model that leverages reinforcement learning to significantly improve visual understanding tasks.

Core Features

Full and LoRA Fine-tuning for GRPO

Support for Multi-node and Multi-image Input Training

Compatibility with various VLMs like QwenVL and InternVL

Optimized inference with xllm and vllm-ascend frameworks

Detailed Introduction

VLM-R1 is an innovative R1-style Large Vision-Language Model designed for enhanced visual understanding. Building upon the Deepseek-R1 concept, it employs a reinforcement learning approach (GRPO) to achieve superior stability and generalizability compared to traditional SFT methods, especially on out-of-domain data. The project demonstrates state-of-the-art performance in tasks like Referring Expression Comprehension (REC) and Open-Vocabulary Detection (OVD), offering robust fine-tuning capabilities and broad hardware compatibility, including Huawei Ascend platforms.