Multimodal AI Model Suite
10.0k 2026-05-06
OpenGVLab/InternVL
A pioneering open-source multimodal large language model family aiming to match or exceed commercial models like GPT-4o/GPT-5 in performance.
Core Features
Pioneering open-source multimodal LLM family.
Achieves state-of-the-art performance across diverse multimodal tasks (general, reasoning, text, agentic).
Offers various model sizes, including large-scale (e.g., 241B) and efficient versions (e.g., 20B).
Open-sources training code, data, and supports HuggingFace `transformers` format.
Incorporates advanced techniques like Variable Visual Position Encoding and Mixed Preference Optimization.
Detailed Introduction
InternVL Family is a groundbreaking open-source suite of multimodal large language models (MLLMs) designed to rival and potentially surpass commercial offerings like GPT-4o and GPT-5. Recognized with a CVPR 2024 Oral, it delivers state-of-the-art performance across a spectrum of multimodal, reasoning, text, and agentic tasks. The project emphasizes transparency by open-sourcing its training code and datasets, making advanced multimodal AI accessible to the research community and developers.