Tags: #data-pipeline
vectordotdev/vector
A high-performance, end-to-end observability data pipeline that empowers users to collect, transform, and route all their logs and metrics with significant cost reduction and enhanced control.
microsoft/graphrag
A modular graph-based Retrieval-Augmented Generation (RAG) system designed to extract structured data from unstructured text using LLMs to enhance reasoning on private data.
apache/airflow
A platform to programmatically author, schedule, and monitor data workflows.
dagster-io/dagster
A cloud-native data pipeline orchestrator designed for the development, production, and observation of data assets, featuring integrated lineage, observability, and a declarative programming model.
towhee-io/towhee
A cutting-edge framework for building fast and simple neural data processing pipelines, especially for unstructured multi-modal data using LLMs.
bespokelabsai/curator
A Python library for generating and curating high-quality synthetic data for AI model training and structured data extraction.
astronomer/astronomer-cosmos
Integrate dbt Core projects seamlessly into Apache Airflow DAGs and Task Groups, enabling robust data transformation orchestration.
apache/dolphinscheduler
Apache DolphinScheduler is a modern, low-code data orchestration platform designed for agile development and high-performance management of complex data workflows and task dependencies.
jitsucom/jitsu
An open-source, self-hosted Segment alternative for real-time event data collection and streaming to data warehouses.