Data Lineage with Apache Airflow
With Airflow now ubiquitous for DAG orchestration, organizations increasingly depend on Airflow to manage complex inter-DAG dependencies and provide up-to-date runtime visibility into DAG execution. But what effects (if any) would upstream DAGs have on downstream DAGs if dataset consumption was delayed? In this talk we introduce Marquez, an open source metadata service for the collection, aggregation, and visualization of a data ecosystem’s metadata. We will demonstrate how metadata management with Marquez helps maintain inter-DAG dependencies, catalog historical runs of DAGs, and minimize data quality issues.
Topics Covered