Tags: #document-processing

Document Processing Library for AI
Python
57.6k

docling-project/docling

A Python library designed to simplify the processing and parsing of diverse document formats, preparing them for seamless integration with generative AI ecosystems.

Data Processing Library
python
14.5k

Unstructured-IO/unstructured

An open-source ETL solution for transforming complex documents into clean, structured data formats, optimized for language models.

CLI Tool, Utility Library, Document Converter
Python 3.10+
106.6k

microsoft/markitdown

A Python utility for converting various file formats and office documents into structured Markdown, optimized for LLM consumption and text analysis.

AI-powered Document Processing Platform
python
5.2k

katanaml/sparrow

Sparrow is a production-ready platform for structured data extraction and instruction calling from various documents and images using ML, LLM, and Vision LLM technologies.

AI-powered Multimodal Data Extraction Library
python
1.5k

emcf/thepipe

A Python library for extracting clean markdown, multimodal media, and structured data from complex documents using vision-language models.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.