apache/hamilton
A Python library for building modular, testable, and self-documenting data transformation DAGs with built-in lineage and metadata tracking.
Core Features
Quick Start
pip install "sf-hamilton[visualization]"Detailed Introduction
Apache Hamilton is an incubating Apache project, a lightweight Python library designed to simplify the creation and management of data transformation workflows. It enables data scientists and engineers to define complex dataflows as Directed Acyclic Graphs (DAGs) using standard Python functions, promoting modularity, testability, and self-documentation. The library automatically builds the DAG, tracks data lineage, and captures metadata, ensuring transparency and maintainability. Its portability allows seamless execution across diverse Python environments, from local scripts to production systems like Airflow or FastAPI, making it ideal for ETL, ML, LLM, and BI applications.