Why Advanced dbt Patterns Matter
Basic dbt — incremental models, schema tests, simple ref() chains — only takes analytics engineering teams so far. In production, with dozens of engineers and hundreds of models, the cracks start to show. Copy-paste SQL proliferates as teams replicate business logic across models because there is no shared macro library. Test coverage remains shallow because the four built-in generic tests (not_null, unique, accepted_values, relationships) cannot express complex business rules. Multi-environment deployments break because dev, staging, and production schemas are handled with ad-hoc workarounds rather than first-class dbt constructs. CI pipelines run the full model graph on every pull request — a process that takes twenty minutes and gets skipped under deadline pressure.
Advanced dbt patterns address each of these failure modes systematically. Jinja2 macros with the dispatch mechanism replace copy-paste SQL with parametric, adapter-aware functions. dbt-expectations and custom generic tests give teams the expressivity to encode business-specific data contracts directly in YAML. Profiles with multiple targets and env_var() injection make environment promotion a configuration concern, not a code change. Slim CI with state:modified+ selection cuts CI runtime from twenty minutes to two by running only the models affected by a given PR.
Note
Advanced Macro Development — Jinja2, Dispatch, and Global Overrides
dbt macros are Jinja2 functions stored in the macros/ directory and called from model SQL, other macros, or schema YAML. Beyond simple string-interpolation helpers, macros enable DRY SQL generation for patterns that repeat across models: safe division, surrogate key hashing, audit column injection, and environment-specific schema routing. The Jinja2 templating engine gives macros the full power of Python-like control flow — loops, conditionals, variable scoping — which makes them far more capable than SQL views or CTEs.
The dispatch Mechanism
The dispatch mechanism lets you write adapter-specific implementations of the same macro and have dbt select the correct one at compile time based on the target adapter (BigQuery, Snowflake, Redshift, DuckDB, etc.). dbt-utils uses dispatch extensively — every macro like dbt_utils.surrogate_key has adapter-specific variants (e.g., bigquery__surrogate_key, default__surrogate_key) that get resolved at compile time. You can override any dispatched macro in your own project by creating a macro with the naming convention <adapter>__<macro_name> or by adding your package to the dispatch search order indbt_project.yml.
-- macros/safe_divide.sql
-- Safe division macro: returns null instead of raising a ZeroDivisionError.
-- Usage: {{ safe_divide('revenue', 'sessions') }}
-- Adapter-aware: BigQuery uses SAFE_DIVIDE(), others use CASE WHEN.
{% macro safe_divide(numerator, denominator) %}
{{ return(adapter.dispatch('safe_divide', 'my_project')(numerator, denominator)) }}
{% endmacro %}
{% macro default__safe_divide(numerator, denominator) %}
CASE
WHEN {{ denominator }} = 0 OR {{ denominator }} IS NULL THEN NULL
ELSE {{ numerator }} / NULLIF({{ denominator }}, 0)
END
{% endmacro %}
{% macro bigquery__safe_divide(numerator, denominator) %}
SAFE_DIVIDE({{ numerator }}, {{ denominator }})
{% endmacro %}
{% macro snowflake__safe_divide(numerator, denominator) %}
IFF({{ denominator }} = 0 OR {{ denominator }} IS NULL, NULL,
{{ numerator }} / {{ denominator }})
{% endmacro %}
-- macros/generate_schema_name.sql
-- Override the default schema name resolution.
-- In production: use the custom schema as-is (no prefix).
-- In dev/staging: prefix with the target name to isolate developer schemas.
-- This prevents dev models from writing into production schemas.
{% macro generate_schema_name(custom_schema_name, node) -%}
{%- set default_schema = target.schema -%}
{%- if custom_schema_name is none -%}
{{ default_schema }}
{%- elif target.name == 'prod' -%}
{#- In production, use the exact custom schema — no target prefix -#}
{{ custom_schema_name | trim }}
{%- else -%}
{#- In dev/staging, prefix with target schema to isolate -#}
{{ default_schema }}_{{ custom_schema_name | trim }}
{%- endif -%}
{%- endmacro %}
-- macros/create_audit_table.sql
-- Uses run_query to execute DDL inside a macro.
-- Creates an audit log table if it does not already exist.
-- Call from an on-run-start hook in dbt_project.yml.
{% macro create_audit_table() %}
{% set create_sql %}
CREATE TABLE IF NOT EXISTS {{ target.schema }}.dbt_audit_log (
run_started_at TIMESTAMP,
model_name STRING,
rows_affected INT64,
invocation_id STRING
)
{% endset %}
{% if execute %}
{% set result = run_query(create_sql) %}
{{ log("Audit table created or already exists", info=True) }}
{% endif %}
{% endmacro %}Note
{% if execute %} guard inside macros that call run_query is critical. dbt parses and compiles every macro during the parse phase — before any SQL is executed. Without the guard, dbt will attempt to run your DDL statement during parsing, which fails because no database connection is available at that point. Always wrap run_query calls in {% if execute %}.dbt Packages — dbt-utils, dbt-expectations, and Writing Your Own
dbt packages are reusable collections of models, macros, and tests installed via the dbt Hub or directly from a Git repository. They are declared in packages.yml at the project root and installed with dbt deps. Two packages are nearly universal in production dbt projects: dbt-utils for SQL utility macros and dbt-expectations for Great Expectations-style data quality tests.
dbt-utils Key Macros
The most valuable macros from dbt-utils in production are: surrogate_key for generating stable MD5-hashed primary keys from composite natural keys; star for selecting all columns from a ref while explicitly excluding a list; union_relations for safely unioning multiple tables with differing column sets (adds nulls for missing columns); get_column_values for dynamically reading distinct values from a column at compile time (used to build dynamic pivots); and pivot for generating column-per-value pivot SQL without manual repetition.
# packages.yml — install at project root, run: dbt deps
packages:
- package: dbt-labs/dbt_utils
version: [">=1.1.0", "<2.0.0"]
- package: calogica/dbt_expectations
version: [">=0.10.0", "<1.0.0"]
# Install a private package from GitHub
- git: "https://github.com/your-org/dbt-internal-utils.git"
revision: v1.2.0
-- models/fct_orders.sql
-- Using surrogate_key to generate a stable, adapter-agnostic primary key.
-- surrogate_key concatenates the columns and hashes the result with MD5.
SELECT
{{ dbt_utils.surrogate_key(['order_id', 'source_system']) }} AS order_sk,
order_id,
source_system,
customer_id,
order_date,
total_amount_cents,
status
FROM {{ ref('stg_orders') }}
-- models/fct_events_pivoted.sql
-- Dynamic pivot using get_column_values + pivot.
-- Reads distinct event_type values at compile time — no hardcoding required.
{%- set event_types = dbt_utils.get_column_values(
ref('stg_events'),
'event_type',
default=[]
) -%}
SELECT
user_id,
date_trunc('day', event_at) AS event_date,
{{ dbt_utils.pivot(
'event_type',
event_types,
agg='count',
then_value=1,
else_value=0,
suffix='_count'
) }}
FROM {{ ref('stg_events') }}
GROUP BY 1, 2
# schema.yml — dbt-expectations tests
# Great Expectations-style assertions in YAML
version: 2
models:
- name: fct_orders
columns:
- name: order_sk
tests:
- not_null
- unique
- name: total_amount_cents
tests:
- not_null
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
max_value: 10000000 # 100,000 USD in cents — flag absurd values
config:
severity: warn
- name: order_date
tests:
- dbt_expectations.expect_column_values_to_be_of_type:
column_type: date
- name: fct_order_items
tests:
# Table-level test: unit_price must always be <= order_item_total
- dbt_expectations.expect_column_pair_values_A_to_be_greater_than_B:
column_A: order_item_total_cents
column_B: unit_price_cents
or_equal: trueWriting and Publishing Your Own Package
Internal packages let you share macros, generic tests, and even staging models across multiple dbt projects in your organization. The structure mirrors a regular dbt project: a dbt_project.yml declaring the package name and version, a macros/ directory with shared helpers, and an optional macros/tests/ directory for custom generic tests. Version the package with Git tags and reference specific revisions in consumer projects. For cross-adapter dispatch, define a dispatch key in dbt_project.yml to control macro resolution order.
# dbt_project.yml for your internal package (my-org/dbt-internal-utils)
name: 'dbt_internal_utils'
version: '1.0.0'
config-version: 2
# Dispatch order: project overrides come first, then this package, then dbt_utils
dispatch:
- macro_namespace: dbt_utils
search_order: ['dbt_internal_utils', 'dbt_utils']
# No models in a utilities package — macros and tests only
# models: (leave empty or omit)
# Declare this so consumers can reference it
require-dbt-version: [">=1.5.0", "<2.0.0"]Custom Generic and Singular Tests
dbt supports two categories of tests beyond the four built-in generics. Generic tests are Jinja2 SQL templates stored in macros/tests/ (or any subdirectory of macros/) that accept a model and column as arguments and return a query whose result count must be zero to pass. They are reusable across any model by referencing the test name in schema YAML. Singular tests are standalone SQL files in the tests/ directory that return failing rows — useful for complex multi-table referential integrity checks that do not generalize to a parameter template.
The store_failures configuration tells dbt to materialize the failing rows to a table in the database rather than just returning a count. This makes test failures auditable — analysts and data consumers can query the failure table to understand exactly which records violated the contract. Combined with --store-failures in CI, your pipeline writes a snapshot of every test failure to the warehouse, enabling retrospective analysis of data quality trends over time.
-- macros/tests/expect_at_least_one_value.sql
-- Custom generic test: asserts that a column contains at least one non-null,
-- non-empty occurrence of a given value.
-- Usage in schema.yml:
-- tests:
-- - expect_at_least_one_value:
-- value: 'completed'
{% test expect_at_least_one_value(model, column_name, value) %}
WITH value_check AS (
SELECT COUNT(*) AS match_count
FROM {{ model }}
WHERE {{ column_name }} = '{{ value }}'
)
SELECT match_count
FROM value_check
WHERE match_count = 0 -- test fails (returns rows) when no matches found
{% endtest %}
-- macros/tests/expect_no_orphan_fk.sql
-- Generic referential integrity test.
-- Asserts every value in column_name has a matching row in the parent table.
-- More flexible than the built-in 'relationships' test because it supports
-- multi-column joins and custom WHERE filters.
{% test expect_no_orphan_fk(model, column_name, parent_model, parent_column) %}
SELECT child.{{ column_name }}
FROM {{ model }} AS child
LEFT JOIN {{ ref(parent_model) }} AS parent
ON child.{{ column_name }} = parent.{{ parent_column }}
WHERE child.{{ column_name }} IS NOT NULL
AND parent.{{ parent_column }} IS NULL
{% endtest %}
-- tests/assert_order_items_match_orders.sql
-- Singular test: complex referential integrity check that cannot be
-- expressed as a generic template.
-- Returns rows that indicate a data integrity violation.
SELECT
oi.order_item_id,
oi.order_id,
oi.created_at
FROM {{ ref('fct_order_items') }} AS oi
LEFT JOIN {{ ref('fct_orders') }} AS o
ON oi.order_id = o.order_id
WHERE o.order_id IS NULL
-- Only check recent data to avoid performance issues on large tables
AND oi.created_at >= CURRENT_DATE - INTERVAL '30 days'
# schema.yml — using custom tests with severity and store_failures
version: 2
models:
- name: fct_orders
tests:
# Table-level singular test equivalent via generic test
- expect_at_least_one_value:
column_name: status
value: 'completed'
config:
severity: warn # warn instead of error so CI still passes
columns:
- name: customer_id
tests:
- expect_no_orphan_fk:
parent_model: dim_customers
parent_column: customer_id
config:
severity: error
store_failures: true # materialize failures to warehouse
store_failures_as: table # 'table' or 'view'Note
store_failures: true is set, dbt writes failing rows to a table in a schema named <target_schema>_dbt_test__audit by default. You can customize the failure schema with the schema config key on the test. Run dbt test --store-failures in CI to enable this globally without modifying every test YAML. These failure tables are incremental by default in dbt 1.6+ — they append new failures rather than truncating, giving you a historical audit trail.Multi-Environment Deployments — Profiles, Targets, and Vars
dbt's profiles.yml defines named connection profiles with multiple targets — one per environment (dev, staging, prod). Each target specifies the database connection parameters, the target schema, and any environment-specific settings. The active target is selected at runtime via the --target flag or the DBT_TARGET environment variable. Inside model SQL and macros, the {{ target.name }} variable resolves to the name of the active target, enabling conditional logic for environment-specific behavior.
For secrets and deployment-specific parameters, use env_var() to inject environment variables at compile time. This avoids hardcoding credentials or environment-specific values in version-controlled files. Combine env_var() with var() for values that should be overridable both via environment and CLI: {{ var('start_date', env_var('DBT_START_DATE', '2024-01-01')) }}.
# profiles.yml (~/.dbt/profiles.yml or project root for dbt Cloud)
# Three-environment setup: dev, staging, prod
# Credentials always come from env vars — never hardcoded.
analytics:
target: dev # default target when --target is not specified
outputs:
dev:
type: bigquery
method: oauth # use gcloud auth for local dev
project: my-project-dev
dataset: "{{ env_var('DBT_SCHEMA', 'analytics_dev') }}"
location: EU
threads: 4
timeout_seconds: 300
staging:
type: bigquery
method: service-account
project: my-project-staging
dataset: analytics_staging
keyfile: "{{ env_var('GOOGLE_APPLICATION_CREDENTIALS') }}"
location: EU
threads: 8
timeout_seconds: 600
prod:
type: bigquery
method: service-account
project: my-project-prod
dataset: analytics
keyfile: "{{ env_var('GOOGLE_APPLICATION_CREDENTIALS') }}"
location: EU
threads: 16
timeout_seconds: 900
# dbt_project.yml — environment-specific model configs and grants
name: 'analytics'
version: '1.0.0'
config-version: 2
profile: 'analytics'
model-paths: ["models"]
test-paths: ["tests"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
models:
analytics:
# Staging layer: views everywhere
staging:
+materialized: view
+schema: staging
# Intermediate layer: ephemeral in dev (no warehouse cost), table in prod
intermediate:
+materialized: "{{ 'ephemeral' if target.name == 'dev' else 'table' }}"
# Marts: tables in prod and staging, views in dev for fast iteration
marts:
+materialized: "{{ 'table' if target.name in ['prod', 'staging'] else 'view' }}"
+schema: marts
# Grant SELECT to the reporting role in production only
+grants:
select:
- "{{ 'reporting_role' if target.name == 'prod' else 'dev_role' }}"
# on-run-start hooks run before any models execute
on-run-start:
- "{{ create_audit_table() }}"
-- models/marts/fct_revenue.sql
-- Model that branches behavior based on target environment.
-- In prod: scan the last 90 days only for cost efficiency.
-- In dev: scan only the last 7 days to keep query cost low.
{% set lookback_days = 90 if target.name == 'prod' else 7 %}
{% set source_project = env_var('DBT_SOURCE_PROJECT', 'my-project-dev') %}
SELECT
date_trunc('{{ var("grain", "day") }}', event_at) AS period,
tenant_id,
SUM(revenue_cents) / 100.0 AS revenue_usd,
COUNT(DISTINCT order_id) AS order_count,
COUNT(DISTINCT customer_id) AS unique_customers
FROM `{{ source_project }}.raw.orders`
WHERE event_at >= CURRENT_DATE - INTERVAL {{ lookback_days }} DAY
AND status = 'completed'
GROUP BY 1, 2Note
grants is set on a model, dbt executes the appropriate GRANT SELECT ON ... statement after each model run — keeping access control co-located with model definition. In BigQuery, grants map to IAM bindings. In Snowflake and Redshift, they map to SQL GRANT statements. dbt does not revoke grants by default; set +grant_access_to carefully for strict access control scenarios.Snapshots — SCD Type 2 for Slowly Changing Dimensions
dbt snapshots implement Slowly Changing Dimension Type 2 (SCD2): they track historical changes to source records by appending rows with validity windows instead of overwriting. Each snapshot execution compares the current state of the source against the snapshot table and inserts new rows for changed records, closing the previous row's validity window. The result is a table where every version of every record is preserved, queryable as-of any point in time using the dbt_valid_from and dbt_valid_to columns added by dbt.
Two strategies are available. The timestamp strategy uses an updated_at column on the source: a record is considered changed when its updated_at is later than the last snapshot. This is efficient and reliable when sources maintain accurate update timestamps. The check strategy compares a specified list of columns value-by-value: a record is considered changed when any watched column differs from the previous snapshot. Use the check strategy for sources that do not have reliable update timestamps.
-- snapshots/snp_customers.sql
-- SCD Type 2 snapshot of the customers table using timestamp strategy.
-- Run with: dbt snapshot
{% snapshot snp_customers %}
{{
config(
target_schema='snapshots',
unique_key='customer_id',
strategy='timestamp',
updated_at='updated_at', -- source column indicating last modification
invalidate_hard_deletes=True -- close validity window for deleted records
)
}}
SELECT
customer_id,
email,
first_name,
last_name,
plan_tier, -- SCD: we want history when customers upgrade/downgrade
country_code,
created_at,
updated_at
FROM {{ source('crm', 'customers') }}
{% endsnapshot %}
-- snapshots/snp_product_prices.sql
-- Check strategy: no reliable updated_at, so we watch specific columns.
{% snapshot snp_product_prices %}
{{
config(
target_schema='snapshots',
unique_key='product_id',
strategy='check',
check_cols=['price_cents', 'currency', 'active'],
invalidate_hard_deletes=True
)
}}
SELECT
product_id,
sku,
price_cents,
currency,
active
FROM {{ source('catalog', 'products') }}
{% endsnapshot %}
-- models/marts/fct_revenue_with_plan.sql
-- Join against the customers snapshot for point-in-time plan tier.
-- For each order, look up the customer's plan tier AT THE TIME of the order —
-- not their current plan tier. This is the core value of SCD2 snapshots.
SELECT
o.order_id,
o.customer_id,
o.order_date,
o.total_amount_cents,
-- Point-in-time plan tier: what plan was the customer on when they ordered?
c.plan_tier AS plan_tier_at_order_time,
-- Current plan tier for comparison
c_current.plan_tier AS current_plan_tier
FROM {{ ref('fct_orders') }} AS o
-- Join to the snapshot with validity window for point-in-time accuracy
LEFT JOIN {{ ref('snp_customers') }} AS c
ON o.customer_id = c.customer_id
AND o.order_date >= c.dbt_valid_from
AND (o.order_date < c.dbt_valid_to OR c.dbt_valid_to IS NULL)
-- Join to the snapshot for current state (dbt_valid_to IS NULL = current record)
LEFT JOIN {{ ref('snp_customers') }} AS c_current
ON o.customer_id = c_current.customer_id
AND c_current.dbt_valid_to IS NULLNote
dbt snapshot && dbt run in your CI/CD pipeline or orchestrator. Snapshots are append-only operations that read from sources, so they should execute first to capture the current state before any transformations. Avoid running snapshots more frequently than your source data changes — hourly snapshots of a table that updates once a day simply add unnecessary rows with identical data.Slim CI with State-Based Selection
In large dbt projects, running the full model graph on every pull request is prohibitively expensive — both in time and warehouse cost. Slim CI solves this by using dbt's state-based selectionto run only the models changed in the PR and their downstream dependents. The mechanism relies on comparing the current project's manifest.json (produced by the current run) against the manifest.json from the previous successful run on the target branch. dbt compares model hashes to determine which models have changed.
The state:modified+ selector matches models whose SQL, config, or upstream sources have changed, plus all of their downstream dependents (+ suffix). Combined with dbt build (which runs models, tests, snapshots, and seeds in topological order), dbt build --select state:modified+ gives you a minimal, safe CI execution that proves the changed models and their dependents work correctly without touching unrelated parts of the graph.
Additionally, integrate dbt source freshness into your pipeline to check that source tables have been updated within their expected freshness windows before running transformations. Stale sources are a common cause of incorrect downstream results, and catching them at the start of the pipeline avoids wasted compute on outdated data.
# .github/workflows/dbt-slim-ci.yml
# Slim CI: run only changed models and their dependents on every PR.
# Downloads the previous manifest.json from the last successful production run
# and uses it as the state comparison baseline.
name: dbt Slim CI
on:
pull_request:
branches: [main]
paths:
- 'models/**'
- 'macros/**'
- 'tests/**'
- 'snapshots/**'
- 'dbt_project.yml'
- 'packages.yml'
jobs:
slim-ci:
name: dbt Slim CI
runs-on: ubuntu-latest
timeout-minutes: 30
env:
DBT_TARGET: ci
GOOGLE_APPLICATION_CREDENTIALS: /tmp/gcp-sa.json
steps:
- name: Checkout PR branch
uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: '3.12'
- name: Install dbt dependencies
run: |
pip install dbt-bigquery==1.8.*
dbt deps
- name: Write GCP service account key
run: echo '${{ secrets.GCP_SA_KEY }}' > /tmp/gcp-sa.json
# Download the previous manifest from the last successful production run.
# We store it as a GitHub Actions artifact after each main branch deploy.
- name: Download previous manifest (state baseline)
uses: dawidd6/action-download-artifact@v6
with:
workflow: dbt-production.yml
branch: main
name: dbt-manifest
path: ./state/
continue-on-error: true # first run will not have a previous manifest
# Check source freshness before running any models
- name: Check source freshness
run: |
dbt source freshness --target ${{ env.DBT_TARGET }} --profiles-dir .
continue-on-error: true # warn on stale sources but don't block CI
# Run only models changed in this PR + their dependents
# --defer: use production-run results for unmodified upstream models
# so tests on changed models can reference prod data
- name: dbt build (slim CI — modified models only)
run: |
dbt build --select state:modified+ --defer --state ./state/ --target ${{ env.DBT_TARGET }} --profiles-dir . --store-failures --fail-fast
env:
DBT_SCHEMA: "ci_pr_${{ github.event.number }}"
# Upload the manifest from this run so the next PR can use it
- name: Upload manifest artifact
if: always()
uses: actions/upload-artifact@v4
with:
name: dbt-manifest-ci
path: ./target/manifest.json
retention-days: 7
# Notify on failure
- name: Notify Slack on failure
if: failure()
uses: slackapi/slack-github-action@v1
with:
payload: |
{
"text": ":x: dbt Slim CI failed on PR #${{ github.event.number }}",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": ":x: *dbt Slim CI failed* — PR #${{ github.event.number }} <${{ github.event.pull_request.html_url }}|View PR>"
}
}
]
}
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
SLACK_WEBHOOK_TYPE: INCOMING_WEBHOOKNote
--defer flag is the companion to slim CI: when running only modified models, dbt cannot resolve upstream refs that were not built in this run. With --defer, dbt falls back to the production database for unbuilt upstream models — so tests on your changed model can still reference real data from the production run. This requires the state manifest to include run results (run_results.json) alongside the manifest, and the CI service account must have read access to the production dataset.Further Reading
- Jinja2 and Macros in dbt — complete reference for macro syntax, context variables, filters, the run_query function, and the dispatch mechanism for adapter-specific overrides
- dbt Hub — package registry for discovering and installing community and vendor dbt packages including dbt-utils, dbt-expectations, and adapter-specific utility packages
- State Comparison Caveats — important edge cases in state-based selection: when models are considered modified even if SQL did not change, cross-version manifest compatibility, and freshness comparisons for sources
- dbt-expectations on GitHub — full list of available test macros, configuration options, version compatibility matrix, and contribution guide for adding new expectations
- dbt-project-evaluator — official dbt Labs package that audits your project for best practices violations: missing tests, missing documentation, fan-in/fan-out model DAG anti-patterns, and direct source references in non-staging models
Work with us
Running dbt in production and hitting walls with copy-paste SQL, weak test coverage, or brittle multi-environment deployments?
We help analytics engineering teams implement advanced dbt patterns — from macro libraries and package ecosystems to custom test frameworks, multi-environment deployment strategies, SCD Type 2 snapshots, and slim CI pipelines that keep your warehouse reliable. Let’s talk.
Get in touch