Back to Blog
    airfloworchestrationpythondata-engineeringadvanced

    Advanced Airflow: Dynamic DAGs and TaskGroups

    Aivena Engineering2026-02-112 min read

    Advanced Airflow: Dynamic DAGs and TaskGroups

    As data platforms scale, manually managing DAG files becomes a liability. A senior data engineer's goal is to move from Pipeline Management to Platform Engineering. This means building systems that generate pipelines.

    The Architecture of Dynamic Generation

    A professional dynamic DAG setup separates the Configuration (what to run) from the Implementation (how to run).

    graph LR subgraph GitRepo [Git Repository] Config[YAML Configs] Factory[Python DAG Factory] end subgraph AivenaOS [Aivena Data OS] GitSync[Git-Sync Sidecar] Scheduler[Airflow Scheduler] Workers[K8s Workers] end Config -->|Push| GitSync Factory -->|Push| GitSync GitSync -->|Mount| Scheduler Scheduler -->|Parse| Factory Factory -->|Generate| DAGs[50+ Dynamic DAGs] DAGs -->|Execute| Workers style AivenaOS fill:#f5f7ff,stroke:#4a6cf7,stroke-width:2px style GitRepo fill:#fff,stroke:#333

    1. The Dynamic DAG Factory

    Instead of 50 files, we use one Python file that iterates over a configuration directory.

    Configuration (config/etl_manifest.yaml)

    yaml

    common_settings:

    owner: "data-platform"

    retries: 3

    pipelines:

    - id: "user_ingestion"

    source_table: "raw_users"

    schedule: "@hourly"

    cpu: "500m"

    - id: "transaction_processing"

    source_table: "raw_transactions"

    schedule: "/15 *"

    cpu: "1000m"

    Implementation (dag_factory.py)

    python

    import os

    import yaml

    from airflow import DAG

    from airflow.operators.python import PythonOperator

    from datetime import datetime, timedelta

    def create_dag(dag_id, schedule, cpu_limit, default_args):

    with DAG(

    dag_id=dag_id,

    schedule_interval=schedule,

    default_args=default_args,

    catchup=False

    ) as dag:

    # Define tasks here

    pass

    return dag

    # Load config

    config_path = os.path.join(os.path.dirname(__file__), 'config/etl_manifest.yaml')

    with open(config_path, 'r') as f:

    manifest = yaml.safe_load(f)

    for pipe in manifest['pipelines']:

    dag_id = f"dynamic_{pipe['id']}"

    globals()[dag_id] = create_dag(

    dag_id=dag_id,

    schedule=pipe['schedule'],

    cpu_limit=pipe['cpu'],

    default_args={

    'owner': manifest['common_settings']['owner'],

    'start_date': datetime(2026, 1, 1),

    'retries': manifest['common_settings']['retries']

    }

    )

    Pro Tip: Always use globals()[dag_id] in the top-level scope. Airflow scans for objects of type DAG in the global namespace of Python files in the DAGs folder.

    2. Organizing Complexity with TaskGroups

    Spaghetti DAGs are hard to debug. TaskGroup allows you to group related tasks visually and logically.

    python

    from airflow.utils.task_group import TaskGroup

    with DAG(dag_id="complex_processing", ...) as dag:

    start = EmptyOperator(task_id="start")

    with TaskGroup("ingestion_layer") as ingestion:

    t1 = PythonOperator(task_id="fetch_api", ...)

    t2 = PythonOperator(task_id="validate_json", ...)

    t1 >> t2

    with TaskGroup("transformation_layer") as transformation:

    t3 = PythonOperator(task_id="clean_data", ...)

    t4 = PythonOperator(task_id="calculate_metrics", ...)

    t3 >> t4

    end = EmptyOperator(task_id="end")

    start >> ingestion >> transformation >> end

    In the Aivena UI, these appear as collapsible boxes. This isn't just aesthetic—it prevents the "Grid View" from becoming unreadable as your pipeline grows to 100+ tasks.

    Why Aivena?

    Managing dynamic DAGs requires a high-performance scheduler and a robust file-sync mechanism.

    * Rapid Parsing: Aivena's Airflow deployment is tuned for fast DAG file parsing, ensuring that changes to your YAML configs are reflected in the UI in seconds.

    * Integrated VSCode: Use the Aivena One-Click IDE to live-edit your factory and see the DAG appear in the next scheduler cycle.


    Master your orchestration. Deploy Managed Airflow on Aivena Data OS.