DagSmith Documentation
Smith production-ready Airflow DAGs from YAML — schema-validated, registry-driven, GCP-ready.
What is DagSmith?
DagSmith is a code-generation framework that compiles structured YAML pipeline definitions into fully typed, production-ready Apache Airflow DAG files. Instead of writing repetitive Python boilerplate for each DAG, you declare your pipeline in YAML and DagSmith handles the rest: imports, operator instantiation, dependency wiring, and code formatting.
Why DagSmith?
Author-Time Validation
Pydantic schemas catch bad config before code generation, not at Airflow deploy time. Get fast feedback with precise error messages.
Pluggable Registry
Add new operators or sensors to a YAML config file with zero Python code changes. First-party, third-party, and custom operators coexist cleanly.
GCP-Native
First-class BigQuery and GCS operator support with automatic FinOps label injection for cost tracking and attribution.
Clean Output
Generated DAGs are human-readable, ruff-formatted Python that you can review, version-control, and deploy with confidence.
Full CLI Toolkit
Generate, validate, list registered operators, and resolve variables — all from the command line with rich, colorized output.
16+ Built-In Operators
BigQuery, GCS, Python, Bash, Branching, Sensors, Triggers, TaskGroups — each with dedicated Pydantic models and field-level validation.
How It Works
.py DAG file with all imports, operators, and dependencies.ruff for consistent style and clean imports.Supported Operators at a Glance
| Category | Operator / Sensor | Type |
|---|---|---|
| Standard | PythonOperator | operator |
| Standard | BranchPythonOperator | operator |
| Standard | BashOperator | operator |
| Standard | EmptyOperator | operator |
| Standard | TriggerDagRunOperator | operator |
| Standard | ExternalTaskSensor | sensor |
| BigQuery | BigQueryInsertJobOperator | operator |
| BigQuery | BigQueryCheckOperator | operator |
| BigQuery | BigQueryValueCheckOperator | operator |
| BigQuery | BigQueryTableExistenceSensor | sensor |
| GCS | GCSToBigQueryOperator | operator |
| GCS | GCSToGCSOperator | operator |
| GCS | GCSDeleteObjectsOperator | operator |
| GCS | GCSObjectsWithPrefixExistenceSensor | sensor |
| Utility | TaskGroup | util |
| Plugin | Any registered operator/sensor | generic |
Project Layout
dagsmith/
src/
dagsmith/
__init__.py # version
__main__.py # python -m dagsmith
cli/ # CLI: generate, validate, list, resolve
loader.py # YAML loading + ${VAR} expansion + validation
code_generator.py # renders YamlDagSpec -> .py string
callables.py # dotted-path -> (module, fn, alias) resolver
dependencies.py # >> / << dependency string parser
cron.py # cron expression humanizer
utils.py # py_repr, safe_var, humanize_readable_time
constants.py # color constants, TriggerRule enum
configs/
airflow_registry.yaml # operator/sensor registry + FinOps labels
registry/
core.py # loads airflow_registry.yaml, get_import_line
models.py # RegistryEntry, RegistryConfig Pydantic models
schemas/
__init__.py # YamlDagSpec root model, discriminated unions
base.py # BaseTaskSpec, BaseSensorOperatorSpec, DagSpec
generic.py # GenericOperatorSpec, GenericSensorSpec
shared_renderers.py # render_common_fields
bigquery/ # BQ operator/sensor specs + renderers
gcs/ # GCS operator/sensor specs + renderers
standard/ # PythonOperator, BashOperator, etc.
tests/ # mirrors src/dagsmith/ layout
pyproject.toml
Dockerfile