Tags: #etl
apache/seatunnel
SeaTunnel is a high-performance, distributed data integration tool designed to synchronize massive amounts of multimodal data from diverse sources with efficiency and stability.
Unstructured-IO/unstructured
An open-source ETL solution for transforming complex documents into clean, structured data formats, optimized for language models.
apache/airflow
A platform to programmatically author, schedule, and monitor data workflows.
kedro-org/kedro
A Python framework for building reproducible, maintainable, and modular data engineering and data science pipelines using software engineering best practices.
apache/hamilton
Apache Hamilton is a lightweight Python library that enables data scientists and engineers to define testable, modular, and self-documenting dataflows (DAGs) with built-in lineage and metadata, portable across any Python environment.
dagster-io/dagster
A cloud-native data pipeline orchestrator designed for the development, production, and observation of data assets, featuring integrated lineage, observability, and a declarative programming model.
PrefectHQ/prefect
Prefect is a Python-based workflow orchestration framework designed to build resilient, dynamic data pipelines that automate processes and recover from unexpected changes.
apache/hop
An open-source platform designed to facilitate all aspects of data and metadata orchestration, enabling efficient data integration and pipeline management.
astronomer/astronomer-cosmos
Integrate dbt Core projects seamlessly into Apache Airflow DAGs and Task Groups, enabling robust data transformation orchestration.
ucbepic/docetl
DocETL is an agentic LLM-powered framework designed for building and executing complex data processing and ETL pipelines, especially for documents.