om-ai-lab/VLM-R1 - OSS Alternative - Discover Top Open Source Alternatives to Popular Software
AI Model Framework
6.0k 2026-05-01

om-ai-lab/VLM-R1

VLM-R1 is a stable and generalizable R1-style Large Vision-Language Model that leverages reinforcement learning to significantly improve visual understanding tasks.

Core Features

Full and LoRA Fine-tuning for GRPO
Support for Multi-node and Multi-image Input Training
Compatibility with various VLMs like QwenVL and InternVL
Optimized inference with xllm and vllm-ascend frameworks

Detailed Introduction

VLM-R1 is an innovative R1-style Large Vision-Language Model designed for enhanced visual understanding. Building upon the Deepseek-R1 concept, it employs a reinforcement learning approach (GRPO) to achieve superior stability and generalizability compared to traditional SFT methods, especially on out-of-domain data. The project demonstrates state-of-the-art performance in tasks like Referring Expression Comprehension (REC) and Open-Vocabulary Detection (OVD), offering robust fine-tuning capabilities and broad hardware compatibility, including Huawei Ascend platforms.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.