Tags: #etl
apache/seatunnel
SeaTunnel is a high-performance, distributed data integration tool designed to synchronize massive amounts of multimodal data from diverse sources with efficiency and stability.
Unstructured-IO/unstructured
An open-source ETL solution for transforming complex documents into clean, structured data formats, optimized for language models.
apache/airflow
A robust open-source platform for programmatically authoring, scheduling, and monitoring data workflows.
datachain-ai/datachain
DataChain is a Python-based AI-data warehouse for transforming, analyzing, and versioning unstructured multimodal data like video, audio, PDFs, and images.
apache/hamilton
A Python library for building modular, testable, and self-documenting data transformation DAGs with built-in lineage and metadata tracking.
dagster-io/dagster
A cloud-native data pipeline orchestrator designed for the development, production, and observation of data assets, featuring integrated lineage, observability, and a declarative programming model.
PrefectHQ/prefect
Prefect is a Python-based workflow orchestration framework designed to build resilient and dynamic data pipelines, automating complex data processes with features like scheduling, caching, and retries.
apache/hop
An open-source platform designed to facilitate all aspects of data and metadata orchestration, enabling efficient data integration and pipeline management.
astronomer/astronomer-cosmos
Integrate dbt Core projects seamlessly into Apache Airflow DAGs and Task Groups, enabling robust data transformation orchestration.