YAML Spec Format
Every section and field in the DagSmith YAML specification, explained.
Section Order
DagSmith YAML specs follow a conventional section order. All sections except metadata, dag, and gcp are optional.
variables: # 1. Optional - key-value pairs for ${VAR} substitution
configurations: # 2. Optional - reusable config values
metadata: # 3. Required - documentation metadata
dag: # 4. Required - DAG constructor arguments
gcp: # 5. Required - GCP connection defaults
default_args: # 6. Optional - Airflow default_args
user_defined_macros: # 7. Optional - Jinja macros
tasks: # 8. Optional - operator/sensor/group specs
dependencies: # 9. Optional - task execution order
1. Variables
optional
Key-value pairs for ${VAR_NAME} substitution throughout the entire YAML. Expansion happens before Pydantic validation, so variables work in every section.
Naming rules (strictly enforced)
| Rule | Example |
|---|---|
| Must be ALL UPPERCASE | VAR__PROJECT_ID__VAR |
Must begin with VAR__ | VAR__DATASET__VAR |
Must end with __VAR | VAR__ENV__VAR |
variables:
VAR__PROJECT_ID__VAR: "my-gcp-project-001"
VAR__DATASET__VAR: "warehouse_tables"
VAR__ENV__VAR: "prod"
VAR__BUNDLE__VAR: "daily_load"
# Usage anywhere in the spec:
gcp:
project_id: "${VAR__PROJECT_ID__VAR}" # expands to "my-gcp-project-001"
configurations:
base_path: "/home/airflow/gcs/dags/${VAR__PROJECT_ID__VAR}/"
When to use variables: Same spec deployed across environments (dev/staging/prod), repeated values like project_id in multiple tasks, paths with embedded identifiers.
When to skip: Single-environment DAGs with no repeated values, quick prototyping where indirection adds noise.
How expansion works
- DagSmith parses the
variablessection from raw YAML - All
${VAR__...__VAR}references are replaced with their values as plain strings - The expanded YAML is then parsed and validated by Pydantic
- Invalid variable names are rejected with a clear error before any expansion happens
2. Configurations
optional — default: base_path="/home/airflow/gcs/dags"
Reusable typed configuration values. Unlike variables (which are substituted as strings), configurations are preserved as typed values. Supports arbitrary additional keys.
configurations:
base_path: "/home/airflow/gcs/dags/${VAR__PROJECT_ID__VAR}/"
# custom_key: "any additional config" # extra keys allowed
3. Metadata
required — all fields required
Documentation metadata rendered into the generated file's docstring header.
metadata:
title: "Daily Account Activity Load"
owner: "data-team@example.com"
email: "data-team@example.com"
version: "1.0.0"
jira: "PROJ-1234"
developer_name: "daily_load"
4. DAG
required — maps to airflow.DAG() constructor
""
@daily, @hourly, etc). Default: None (manual only). alias: schedule_interval
YYYY-MM-DD or YYYY-MM-DD HH:MM:SS. Default: now
"UTC"
false
1
true
[]
None
[]
None
None
dag:
dag_id: "daily_account_load"
description: "Load daily account activity into BigQuery."
schedule: "0 6 * * *" # 6 AM daily
start_date: "2026-01-02 12:13:14"
timezone: "America/New_York"
catchup: false
max_active_runs: 1
dagrun_timeout: 7200
is_paused_upon_creation: true
tags:
- "warehouse:bigquery"
- "module:daily_load"
# DAG-level params (optional)
params:
env:
type: "string"
default: "PROD"
enum: ["PROD", "PLE", "DEV"]
title: "Environment"
description: "Target environment"
5. GCP
required — GCP connection defaults shared across tasks
None
"US", "us-east4"). Default: None
"google_cloud_default". alias: google_cloud_conn_id
None
false
gcp:
project_id: "my-gcp-project-001"
location: "US"
gcp_conn_id: "google_cloud_default"
# impersonation_chain: "sa@project.iam.gserviceaccount.com"
# deferrable: false
6. Default Args
optional — applied to every task in the DAG
"airflow"
false
3
60. alias: retry_delay_seconds
None. alias: sla_seconds
true
[]
false
false
None
None
None
default_args:
owner: "airflow"
depends_on_past: false
retries: 3
retry_delay: 60
deferrable: true
email:
- "data-team@example.com"
email_on_failure: true
email_on_retry: false
# on_failure_callback: "mypackage.callbacks.on_failure"
# on_success_callback: "mypackage.callbacks.on_success"
# on_retry_callback: "mypackage.callbacks.on_retry"
7. User Defined Macros
optional — Jinja macros injected into the DAG constructor
All keys become macro names available as {{ macro_name }} in Jinja templates. Values must be scalars (str, int, float, bool, None).
user_defined_macros:
project_name: "my-gcp-project-001" # {{ project_name }} in templates
fact_dataset: "warehouse_tables" # {{ fact_dataset }}
bundle: "daily_load"
env: "prod"
datastore: "BQ"
latency: 1
8. Tasks
optional — defaults to []
A list of operator, sensor, and TaskGroup specs. Each task uses the operator field as a discriminator to select the correct Pydantic model. See the Operators & Sensors page for details on each type.
Common base fields (all task types)
BigQueryInsertJobOperator)
"all_success". Options: all_success, all_failed, all_done, one_success, one_failed, none_failed, etc.
default_args.retries
Sensor base fields (sensors only)
All sensors inherit these additional fields from BaseSensorOperatorSpec:
60.0
"poke" or "reschedule". Default: "poke"
false
false
None
false
9. Dependencies
optional — defaults to [] (all tasks execute independently in parallel)
Task execution order using >> (downstream) and << (upstream) operators. References both task_id values and group_id values.
dependencies:
- "task_a >> task_b >> task_c" # sequential chain
- "[task_x, task_y] >> task_z" # fan-in (z waits for both x and y)
- "task_z >> [task_a, task_b]" # fan-out (both a and b run after z)
- "task_c << [task_a, task_b]" # fan-in (reverse notation)
- "staging_group >> transform_group" # task group references
Validation: All names in dependency strings are validated against declared task_id and group_id values at load time. An unknown name raises a ValueError with the list of valid names.