Tags: #document-processing - OSS Alternative - Discover Top Open Source Alternatives to Popular Software

Tags: #document-processing

Document Processing and AI Data Preparation Library
python
58.5k

docling-project/docling

Docling simplifies document processing, parsing diverse formats including advanced PDF understanding, and provides seamless integrations with the generative AI ecosystem.

Data Processing Library
python
14.6k

Unstructured-IO/unstructured

An open-source ETL solution for transforming complex documents into clean, structured data formats, optimized for language models.

AI/ML Framework, RAG Development Kit
python
14.9k

llmware-ai/llmware

A unified Python framework for building local, private, and secure enterprise RAG pipelines using small, specialized LLMs and a comprehensive model catalog.

AI-powered Document Processing Platform
Python
5.2k

katanaml/sparrow

A production-ready platform for structured data extraction and instruction calling using ML, LLM, and Vision LLM technologies.

AI-powered Multimodal Data Extraction Library
python
1.5k

emcf/thepipe

A Python library for extracting clean markdown, multimodal media, and structured data from complex documents using vision-language models.

LLM-powered Data Processing Framework
Python
3.7k

ucbepic/docetl

DocETL is an agentic LLM-powered framework designed for building and executing complex data processing and ETL pipelines, especially for documents.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.