DagSmith Documentation

Smith production-ready Airflow DAGs from YAML — schema-validated, registry-driven, GCP-ready.

What is DagSmith?

DagSmith is a code-generation framework that compiles structured YAML pipeline definitions into fully typed, production-ready Apache Airflow DAG files. Instead of writing repetitive Python boilerplate for each DAG, you declare your pipeline in YAML and DagSmith handles the rest: imports, operator instantiation, dependency wiring, and code formatting.

Why DagSmith?

Author-Time Validation

Pydantic schemas catch bad config before code generation, not at Airflow deploy time. Get fast feedback with precise error messages.

🔌

Pluggable Registry

Add new operators or sensors to a YAML config file with zero Python code changes. First-party, third-party, and custom operators coexist cleanly.

GCP-Native

First-class BigQuery and GCS operator support with automatic FinOps label injection for cost tracking and attribution.

📄

Clean Output

Generated DAGs are human-readable, ruff-formatted Python that you can review, version-control, and deploy with confidence.

🛠

Full CLI Toolkit

Generate, validate, list registered operators, and resolve variables — all from the command line with rich, colorized output.

📦

16+ Built-In Operators

BigQuery, GCS, Python, Bash, Branching, Sensors, Triggers, TaskGroups — each with dedicated Pydantic models and field-level validation.

How It Works

1. Define
Write your pipeline in YAML: metadata, DAG config, GCP settings, tasks, and dependencies.
2. Validate
DagSmith expands variables, parses YAML, and validates every field via Pydantic schemas.
3. Generate
The code generator renders a clean, importable .py DAG file with all imports, operators, and dependencies.
4. Format
Output is auto-formatted with ruff for consistent style and clean imports.

Supported Operators at a Glance

Category Operator / Sensor Type
StandardPythonOperatoroperator
StandardBranchPythonOperatoroperator
StandardBashOperatoroperator
StandardEmptyOperatoroperator
StandardTriggerDagRunOperatoroperator
StandardExternalTaskSensorsensor
BigQueryBigQueryInsertJobOperatoroperator
BigQueryBigQueryCheckOperatoroperator
BigQueryBigQueryValueCheckOperatoroperator
BigQueryBigQueryTableExistenceSensorsensor
GCSGCSToBigQueryOperatoroperator
GCSGCSToGCSOperatoroperator
GCSGCSDeleteObjectsOperatoroperator
GCSGCSObjectsWithPrefixExistenceSensorsensor
UtilityTaskGrouputil
PluginAny registered operator/sensorgeneric

Project Layout

Directory Structure
dagsmith/
  src/
    dagsmith/
      __init__.py                # version
      __main__.py                # python -m dagsmith
      cli/                       # CLI: generate, validate, list, resolve
      loader.py                  # YAML loading + ${VAR} expansion + validation
      code_generator.py          # renders YamlDagSpec -> .py string
      callables.py               # dotted-path -> (module, fn, alias) resolver
      dependencies.py            # >> / << dependency string parser
      cron.py                    # cron expression humanizer
      utils.py                   # py_repr, safe_var, humanize_readable_time
      constants.py               # color constants, TriggerRule enum
      configs/
        airflow_registry.yaml    # operator/sensor registry + FinOps labels
      registry/
        core.py                  # loads airflow_registry.yaml, get_import_line
        models.py                # RegistryEntry, RegistryConfig Pydantic models
      schemas/
        __init__.py              # YamlDagSpec root model, discriminated unions
        base.py                  # BaseTaskSpec, BaseSensorOperatorSpec, DagSpec
        generic.py               # GenericOperatorSpec, GenericSensorSpec
        shared_renderers.py      # render_common_fields
        bigquery/                # BQ operator/sensor specs + renderers
        gcs/                     # GCS operator/sensor specs + renderers
        standard/                # PythonOperator, BashOperator, etc.
  tests/                         # mirrors src/dagsmith/ layout
  pyproject.toml
  Dockerfile