AI-powered Document Processing Platform
5.2k 2026-04-13

katanaml/sparrow

Sparrow is a production-ready platform for structured data extraction and instruction calling from various documents and images using ML, LLM, and Vision LLM technologies.

Core Features

Universal document processing (invoices, receipts, forms, statements)
Pluggable architecture with multiple backends (MLX, Ollama, vLLM, Docker)
Multi-format support for images and multi-page PDFs
JSON schema-based extraction with automatic validation
API-first design for easy integration and instruction calling

Quick Start

python api.py

Detailed Introduction

Sparrow is an advanced, production-ready platform designed for extracting structured data from a wide array of documents and images. Leveraging cutting-edge Machine Learning, Large Language Models (LLMs), and Vision LLMs, it transforms unstructured content like invoices, receipts, and forms into clean, queryable JSON data. Its pluggable architecture, multi-backend support, and API-first design make it highly adaptable for various enterprise applications, enabling efficient automation of data processing workflows and intelligent instruction calling.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.