apache/hamilton
Apache Hamilton is a lightweight Python library that enables data scientists and engineers to define testable, modular, and self-documenting dataflows (DAGs) with built-in lineage and metadata, portable across any Python environment.
Core Features
Quick Start
pip install "sf-hamilton[visualization]"Detailed Introduction
Apache Hamilton is an incubating Apache project designed to bring structure and modularity to data-intensive Python applications. It simplifies the creation of complex data transformation pipelines by allowing users to define directed acyclic graphs (DAGs) through regular Python functions, automatically inferring dependencies. This approach ensures dataflows are testable, self-documenting, and inherently track data lineage. Its portability allows it to integrate into diverse environments, from local scripts to large-scale orchestration systems, empowering data teams to build robust, maintainable, and quality-assured data solutions for ETL, ML, LLM, and BI use cases.