Data Orchestration Library
2.4k 2026-04-14

apache/hamilton

A Python library for building modular, testable, and self-documenting data transformation DAGs with built-in lineage and metadata tracking.

Core Features

Define testable, modular, self-documenting dataflows.
Automatic DAG construction from regular Python functions.
Built-in data lineage, tracing, and metadata tracking.
Portable execution across various Python environments (scripts, notebooks, Airflow, FastAPI).
UI for visualization, cataloging, and monitoring of dataflows.

Quick Start

pip install "sf-hamilton[visualization]"

Detailed Introduction

Apache Hamilton is an incubating Apache project, a lightweight Python library designed to simplify the creation and management of data transformation workflows. It enables data scientists and engineers to define complex dataflows as Directed Acyclic Graphs (DAGs) using standard Python functions, promoting modularity, testability, and self-documentation. The library automatically builds the DAG, tracks data lineage, and captures metadata, ensuring transparency and maintainability. Its portability allows seamless execution across diverse Python environments, from local scripts to production systems like Airflow or FastAPI, making it ideal for ETL, ML, LLM, and BI applications.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.