Introduction
We need to do DQ checks through the data flow to ensure product is of best quality. See this design proposal for motivation and further details. This document is specifically to set up DQ checks on dbt modeled data
Stack
The DQ checks on modeled data can use a combination of
- native dbt data tests (
data_testconfig) - dbt-expectations tests
- elementary tests
The first two are predomninantly used for column level and row level checks. Elementary is used for anomaly detection.
Steps
Axe Grinding
Before we write any code, it is important to think through what the modeled data is, and it's business consequences. Use this template to then write out what checks need to be placed on the model and the config for those checks
Typically, there are three types of checks
- Nullability and Uniqueness enforcement
- Check constraints
- Value in range or set of values
- Value distribution
- Anomalies over time / across dbt runs
References
- People: Ramon, Trevor
- Docs: TBA
Implementation
- Edit the dbt
schema.ymlfor the corresponding model to implement checks based on the above prep - Test locally
- To run just the specific tests based on tags:
dbt test --select <tag> - Code review and iterate