AI/ML Model
2.4k 2026-04-18

X-PLUG/mPLUG-DocOwl

A modularized multimodal large language model designed for OCR-free document understanding.

Core Features

OCR-free Document Understanding
Modularized Multimodal LLM Architecture
High-resolution Multi-page Document Processing
Support for diverse document understanding tasks (VQA, ChartQA)
Open-sourced training code and models for finetuning

Detailed Introduction

mPLUG-DocOwl is a powerful family of modularized multimodal large language models developed by Alibaba Group, specializing in OCR-free document understanding. It processes complex documents by integrating visual and linguistic information directly, bypassing traditional OCR limitations. The project offers various models like DocOwl2 for high-resolution multi-page documents and TinyChart for chart understanding, achieving state-of-the-art performance across diverse document AI tasks. It provides open-sourced training data, models, and code, enabling researchers and developers to build advanced document intelligence solutions.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.