Tags: #data-extraction

Web Data API for AI Agents
106.9k

firecrawl/firecrawl

A robust Web Data API designed to provide clean, LLM-ready web data for AI agents, enabling scalable search, scraping, and interaction with the web.

Document Processing Library for AI
Python
57.6k

docling-project/docling

A Python library designed to simplify the processing and parsing of diverse document formats, preparing them for seamless integration with generative AI ecosystems.

Web Scraping Framework
Python
36.5k

D4Vinci/Scrapling

An adaptive Python web scraping framework designed to handle everything from single requests to full-scale crawls, featuring anti-bot bypass and self-healing parsers.

Data Processing Library
python
14.5k

Unstructured-IO/unstructured

An open-source ETL solution for transforming complex documents into clean, structured data formats, optimized for language models.

AI/ML Framework
Python
48.6k

run-llama/llama_index

LlamaIndex is an open-source framework for building agentic applications, specializing in document processing, OCR, parsing, and indexing to empower LLMs.

PDF Processing Library / Document Automation Tool
Java 11+
16.5k

opendataloader-project/opendataloader-pdf

An open-source PDF parser for AI-ready data extraction and automated PDF accessibility compliance.

AI-powered Knowledge Graph Builder
python
4.6k

neo4j-labs/llm-graph-builder

Transforms diverse unstructured data into structured Neo4j Knowledge Graphs using Large Language Models (LLMs) and LangChain.

AI-powered Web Scraping Library
Python
23.2k

ScrapeGraphAI/Scrapegraph-ai

A Python library that leverages LLMs and graph logic to simplify web scraping and data extraction from various sources.

AI-powered Multimodal Data Extraction Library
python
1.5k

emcf/thepipe

A Python library for extracting clean markdown, multimodal media, and structured data from complex documents using vision-language models.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.