Tags: #document-processing
docling-project/docling
Docling simplifies document processing, parsing diverse formats including advanced PDF understanding, and provides seamless integrations with the generative AI ecosystem.
Unstructured-IO/unstructured
An open-source ETL solution for transforming complex documents into clean, structured data formats, optimized for language models.
llmware-ai/llmware
A unified Python framework for building local, private, and secure enterprise RAG pipelines using small, specialized LLMs and a comprehensive model catalog.
katanaml/sparrow
A production-ready platform for structured data extraction and instruction calling using ML, LLM, and Vision LLM technologies.
emcf/thepipe
A Python library for extracting clean markdown, multimodal media, and structured data from complex documents using vision-language models.
ucbepic/docetl
DocETL is an agentic LLM-powered framework designed for building and executing complex data processing and ETL pipelines, especially for documents.