Python Library for Information Extraction
35.6k 2026-04-14

google/langextract

A Python library leveraging LLMs to extract structured information from unstructured text with precise source grounding and interactive visualization.

Core Features

Precise Source Grounding: Maps extractions to exact source text locations for traceability and verification.
Reliable Structured Outputs: Enforces consistent output schemas using few-shot examples and controlled generation.
Optimized for Long Documents: Handles large texts efficiently with chunking, parallel processing, and multi-pass extraction.
Interactive Visualization: Generates self-contained HTML for reviewing extracted entities in their original context.
Flexible LLM Support: Compatible with cloud-based (Gemini, OpenAI) and local (Ollama) large language models.

Quick Start

pip install langextract

Detailed Introduction

LangExtract is a powerful Python library designed to streamline the process of extracting structured data from diverse unstructured text documents, such as clinical notes or reports. By utilizing Large Language Models (LLMs) and user-defined instructions, it accurately identifies and organizes key details. Its core value lies in ensuring high data quality through precise source grounding, reliable structured outputs, and efficient processing of long documents. The library also offers interactive visualization tools and broad LLM compatibility, making it adaptable for various domains without requiring model fine-tuning.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.