Advanced Airflow: Dynamic DAGs and TaskGroups

As data platforms scale, manually managing DAG files becomes a liability. A senior data engineer's goal is to move from Pipeline Management to Platform Engineering. This means building systems that generate pipelines.

The Architecture of Dynamic Generation

A professional dynamic DAG setup separates the Configuration (what to run) from the Implementation (how to run).

1. The Dynamic DAG Factory

Instead of 50 files, we use one Python file that iterates over a configuration directory.

Configuration (`config/etl_manifest.yaml`)

yaml

common_settings: owner: "data-platform" retries: 3 pipelines: - id: "user_ingestion" source_table: "raw_users" schedule: "@hourly" cpu: "500m" - id: "transaction_processing" source_table: "raw_transactions" schedule: "/15 *"

cpu: "1000m"

Implementation (`dag_factory.py`)

python

import os
import yaml
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta

def create_dag(dag_id, schedule, cpu_limit, default_args):
    with DAG(
        dag_id=dag_id,
        schedule_interval=schedule,
        default_args=default_args,
        catchup=False
    ) as dag:
        # Define tasks here
        pass
    return dag

# Load config
config_path = os.path.join(os.path.dirname(__file__), 'config/etl_manifest.yaml')
with open(config_path, 'r') as f:
    manifest = yaml.safe_load(f)

for pipe in manifest['pipelines']:
    dag_id = f"dynamic_{pipe['id']}"
    globals()[dag_id] = create_dag(
        dag_id=dag_id,
        schedule=pipe['schedule'],
        cpu_limit=pipe['cpu'],
        default_args={
            'owner': manifest['common_settings']['owner'],
            'start_date': datetime(2026, 1, 1),
            'retries': manifest['common_settings']['retries']
        }
    )

Pro Tip: Always use globals()[dag_id] in the top-level scope. Airflow scans for objects of type DAG in the global namespace of Python files in the DAGs folder.

2. Organizing Complexity with TaskGroups

Spaghetti DAGs are hard to debug. TaskGroup allows you to group related tasks visually and logically.

python

from airflow.utils.task_group import TaskGroup

with DAG(dag_id="complex_processing", ...) as dag:
    start = EmptyOperator(task_id="start")
    
    with TaskGroup("ingestion_layer") as ingestion:
        t1 = PythonOperator(task_id="fetch_api", ...)
        t2 = PythonOperator(task_id="validate_json", ...)
        t1 >> t2
        
    with TaskGroup("transformation_layer") as transformation:
        t3 = PythonOperator(task_id="clean_data", ...)
        t4 = PythonOperator(task_id="calculate_metrics", ...)
        t3 >> t4
        
    end = EmptyOperator(task_id="end")
    
    start >> ingestion >> transformation >> end

In the Aivena UI, these appear as collapsible boxes. This isn't just aesthetic—it prevents the "Grid View" from becoming unreadable as your pipeline grows to 100+ tasks.

Why Aivena?

Managing dynamic DAGs requires a high-performance scheduler and a robust file-sync mechanism.

* Rapid Parsing: Aivena's Airflow deployment is tuned for fast DAG file parsing, ensuring that changes to your YAML configs are reflected in the UI in seconds.

* Integrated VSCode: Use the Aivena One-Click IDE to live-edit your factory and see the DAG appear in the next scheduler cycle.

Master your orchestration. Deploy Managed Airflow on Aivena Data OS.

Advanced Airflow: Dynamic DAGs and TaskGroups

Advanced Airflow: Dynamic DAGs and TaskGroups

The Architecture of Dynamic Generation

1. The Dynamic DAG Factory

Configuration (`config/etl_manifest.yaml`)

Implementation (`dag_factory.py`)

2. Organizing Complexity with TaskGroups

Why Aivena?

Related Articles

Deploying Apache Airflow Without the Pain

Mastering ClickHouse Materialized Views

Advanced Airflow: Dynamic DAGs and TaskGroups

Advanced Airflow: Dynamic DAGs and TaskGroups

The Architecture of Dynamic Generation

1. The Dynamic DAG Factory

Configuration (config/etl_manifest.yaml)

Implementation (dag_factory.py)

2. Organizing Complexity with TaskGroups

Why Aivena?

Related Articles

Deploying Apache Airflow Without the Pain

Mastering ClickHouse Materialized Views

Configuration (`config/etl_manifest.yaml`)

Implementation (`dag_factory.py`)