Skip to content

Introduction

We need to do DQ checks through the data flow to ensure product is of best quality. See this design proposal for motivation and further details. This document is specifically to set up DQ checks on dbt modeled data

Stack

The DQ checks on modeled data can use a combination of

  1. native dbt data tests (data_test config)
  2. dbt-expectations tests
  3. elementary tests

The first two are predomninantly used for column level and row level checks. Elementary is used for anomaly detection.

Steps

Axe Grinding

Before we write any code, it is important to think through what the modeled data is, and it's business consequences. Use this template to then write out what checks need to be placed on the model and the config for those checks

Typically, there are three types of checks

  1. Nullability and Uniqueness enforcement
  2. Check constraints
  3. Value in range or set of values
  4. Value distribution
  5. Anomalies over time / across dbt runs

References

  1. People: Ramon, Trevor
  2. Docs: TBA

Implementation

  1. Edit the dbt schema.yml for the corresponding model to implement checks based on the above prep
  2. Test locally
  3. To run just the specific tests based on tags: dbt test --select <tag>
  4. Code review and iterate

References / Examples