OSS Alternative - Discover Top Open Source Alternatives to Popular Software

google/langextract

A Python library leveraging LLMs to extract structured information from unstructured text with precise source grounding and interactive visualization.

Core Features

Precise Source Grounding: Maps extractions to exact source text locations for traceability and verification.

Reliable Structured Outputs: Enforces consistent output schemas using few-shot examples and controlled generation.

Optimized for Long Documents: Efficiently handles large texts via chunking, parallel processing, and multi-pass extraction.

Interactive Visualization: Generates self-contained HTML for reviewing extracted entities in their original context.

Flexible LLM Support: Compatible with various LLMs, including cloud models (Gemini, OpenAI) and local models (Ollama).

Quick Start

pip install langextract

Detailed Introduction

LangExtract is a powerful Python library designed for extracting structured information from unstructured text documents using Large Language Models (LLMs). It excels in processing diverse materials like clinical notes or reports, identifying and organizing key details while ensuring precise source grounding for every extraction. The library offers reliable structured outputs through schema enforcement, is optimized for handling long documents, and provides interactive visualization for easy review. With flexible support for various LLMs, from cloud-based services like Google Gemini to local models via Ollama, LangExtract adapts to any domain without requiring model fine-tuning, making it a versatile tool for data extraction tasks.