Tags: #data-pipeline

Observability Data Pipeline
Rust
21.6k

vectordotdev/vector

A high-performance, end-to-end observability data pipeline that empowers users to collect, transform, and route all their logs and metrics with significant cost reduction and enhanced control.

AI/ML Framework
32.1k

microsoft/graphrag

GraphRAG is a modular, graph-based Retrieval-Augmented Generation (RAG) system that leverages LLMs to extract structured data from unstructured text, enhancing reasoning on private datasets.

AI Pipeline Framework
python
2.1k

google-gemini/genai-processors

A lightweight Python library for building modular, asynchronous, and composable AI pipelines, enabling efficient, parallel, and multimodal content processing for Generative AI applications.

Workflow Orchestration Platform
python
45.0k

apache/airflow

A robust open-source platform for programmatically authoring, scheduling, and monitoring data workflows.

Data Orchestration Platform
Python
15.3k

dagster-io/dagster

A cloud-native data pipeline orchestrator designed for the development, production, and observation of data assets, featuring integrated lineage, observability, and a declarative programming model.

AI Data Processing Framework
python
3.4k

towhee-io/towhee

Towhee is a cutting-edge framework designed to simplify and accelerate neural data processing pipelines, particularly for unstructured multimodal data and LLM orchestration.

AI/ML Data Curation Library
python
1.7k

bespokelabsai/curator

A Python library for generating and curating high-quality synthetic data for AI model training and structured data extraction.

Airflow Extension for dbt Orchestration
apache airflow
1.2k

astronomer/astronomer-cosmos

Integrate dbt Core projects seamlessly into Apache Airflow DAGs and Task Groups, enabling robust data transformation orchestration.

Data Processing CLI Tool
4.4k

rom1504/img2dataset

An efficient command-line tool to download, resize, and package vast collections of image URLs into ready-to-use datasets for machine learning.

Data Orchestration Platform
Docker
14.2k

apache/dolphinscheduler

Apache DolphinScheduler is a modern, low-code data orchestration platform designed for agile creation of high-performance workflows and managing complex data pipeline dependencies.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.