Skip to content

bsushmith/airflow-pre-commit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Airflow DAG Validator Pre-commit Hook

Reusable pre-commit hook for validating Airflow DAG files without importing Airflow. The validator parses DAG files with Python AST, so it can run in developer machines and CI without a full Airflow runtime.

Use In Another Repo

Add this to the consuming repo's .pre-commit-config.yaml:

repos:
  - repo: https://github.com/bsushmith/airflow-pre-commit
    rev: v1.0.0
    hooks:
      - id: airflow-dag-validator
        args: [--config, .dag-validator.toml, --show-warnings]

For local testing before publishing this repo:

repos:
  - repo: /absolute/path/to/airflow-pre-commit
    rev: HEAD
    hooks:
      - id: airflow-dag-validator
        args: [--config, .dag-validator.toml, --show-warnings]

Then run:

pre-commit install
pre-commit run airflow-dag-validator --all-files

By default, the hook validates Python files under dags/.

Configure Rules

Create .dag-validator.toml in the consuming repo:

[checks]
dag_id_present = true
owner_present = true
schedule_present = true
start_date_present = true
airflow_imports = true
owner_naming_convention = false
dag_tags_present = false
task_failure_callback = false

[output]
show_warnings = true
show_summary = false
quiet_success = true
format = "text"

[paths]
default_dag_folder = "dags"

The CLI auto-detects .dag-validator.toml, dag-validator.toml, or .airflow-dag-validator.toml. You can also pass --config path/to/config.toml.

quiet_success = true keeps pre-commit output readable when pre-commit splits many DAG files into batches: successful batches stay silent, while failed batches still print errors.

CLI flags override config where applicable:

dag-validator --disable-check task_failure_callback dags/example.py
dag-validator --enable-check task_failure_callback --fail-on-warnings dags/example.py
dag-validator --checks core --dag-folder dags
dag-validator --list-checks

TOML To CLI Mapping

Every active config value can also be set from the CLI:

TOML key CLI flag
validation.default_severity = "error" --default-severity error
checks.<check_name> = true --enable-check <check_name>
checks.<check_name> = false --disable-check <check_name>
check_settings.task_failure_callback.strict = true --task-failure-callback-strict
check_settings.task_failure_callback.strict = false --no-task-failure-callback-strict
check_settings.task_failure_callback.exempt_operators = [...] repeat --task-failure-callback-exempt-operator <operator>
check_settings.owner_present.valid_owners = [...] repeat --valid-owner <regex>
check_settings.owner_naming_convention.pattern = "..." --owner-naming-pattern <regex>
custom_checks.modules = [...] repeat --custom-check-module <module>
output.format = "json" --output-format json or --json
output.format = "text" --output-format text or --no-json
output.show_warnings = true --show-warnings
output.show_warnings = false --no-show-warnings
output.show_summary = true --summary
output.show_summary = false --no-summary
output.quiet_success = true --quiet-success
output.quiet_success = false --no-quiet-success
paths.default_dag_folder = "dags" --dag-folder dags
paths.exclude_patterns = [...] repeat --exclude-pattern <glob>

Custom Checks

Custom checks can live in the consuming repo. Add module names in .dag-validator.toml:

[custom_checks]
modules = ["dag_validation_checks"]

or pass them on the CLI:

dag-validator --custom-check-module dag_validation_checks --list-checks

Then create dag_validation_checks.py in the consuming repo:

import ast
from pathlib import Path
from typing import Any

from dag_validator.checks import BaseCheck


class DagIdPrefixCheck(BaseCheck):
  @property
  def name(self) -> str:
    return "dag_id_prefix"

  @property
  def description(self) -> str:
    return "Ensures DAG IDs start with dp_"

  @property
  def severity(self) -> str:
    return "error"

  def check(self, dag_info: dict[str, Any], tree: ast.AST, file_path: Path) -> list[str]:
    dag_id = dag_info.get("dag_id")
    if dag_info.get("has_dag_object") and dag_id and not dag_id.startswith("dp_"):
      return [f"DAG in {file_path} has dag_id '{dag_id}' without dp_ prefix"]
    return []


CHECKS = [DagIdPrefixCheck()]

Custom checks use the same enable/disable controls as built-in checks:

[checks]
dag_id_prefix = true
dag-validator --enable-check dag_id_prefix
dag-validator --disable-check dag_id_prefix

Built-in Checks

Core checks:

  • dag_id_present: every DAG has a DAG ID.
  • owner_present: every DAG has owner in default_args.
  • schedule_present: every DAG has schedule or schedule_interval.
  • start_date_present: every DAG has start_date.

Quality checks:

  • airflow_imports: warns when a DAG file appears to miss Airflow imports.
  • owner_naming_convention: warns when owner does not match the configured naming pattern.
  • dag_tags_present: warns when a DAG does not define tags.

Alerting checks:

  • task_failure_callback: warns when non-empty operator tasks miss on_failure_callback, unless the DAG or default_args already defines it.

Local Development

uv --cache-dir .uv-cache run pytest tests -q
uv --cache-dir .uv-cache run dag-validator --list-checks

The installed console command is:

dag-validator

About

Reusable pre-commit hook for validating Airflow DAGs with configurable AST-based checks and custom rule extensions

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors