|
| 1 | +# dbt Model Usage |
| 2 | +This [dbt](https://docs.getdbt.com/) package provides tests to let you know whether your models are still relevant to your users. These tests scan your database's query logs to check if users are still SELECTing from the tables dbt produces. Test failures can let you know when it might be time to retire unused models. |
| 3 | + |
| 4 | +<br> |
| 5 | + |
| 6 | +# Database Support |
| 7 | +This package currently supports Google BigQuery and Snowflake. |
| 8 | + |
| 9 | +<br> |
| 10 | + |
| 11 | +# Installation |
| 12 | +1. Add this package to your project's `packages.yml` |
| 13 | + ```yaml |
| 14 | + packages: |
| 15 | + - git: rjh336/dbt-model-usage |
| 16 | + version: 0.1.0 |
| 17 | + ``` |
| 18 | +2. Update dependencies in your project |
| 19 | + ```bash |
| 20 | + $ dbt deps |
| 21 | + ``` |
| 22 | + |
| 23 | +<br> |
| 24 | + |
| 25 | +# Setup |
| 26 | + |
| 27 | +## In your project |
| 28 | +You can configure this package via the [vars](https://docs.getdbt.com/docs/building-a-dbt-project/building-models/using-variables) config in your `dbt_project.yml` |
| 29 | + |
| 30 | +```yaml |
| 31 | +# dbt_project.yml |
| 32 | +
|
| 33 | +vars: |
| 34 | + # BIGQUERY ONLY: |
| 35 | + model_usage_dbt_query_comment_pattern: 'my-custom-query-comment-regex' |
| 36 | +
|
| 37 | + # SNOWFLAKE ONLY: |
| 38 | + model_usage_dbt_query_tag_pattern: 'my-custom-query-tag-regex' |
| 39 | +``` |
| 40 | + |
| 41 | +- `model_usage_dbt_query_comment_pattern`: Regular expression string used to find and EXCLUDE queries executed by dbt via the [query-comment](https://docs.getdbt.com/reference/project-configs/query-comment). Normally we would not consider these queries as 'user' queries since they might run every time models are built (e.g. tests and hooks). By default, this value is set to **`^\/\*\s+\{"app"\:\s+"dbt".*`**. If your project uses a custom query-comment you might want to use your own pattern. If you prefer to count dbt-generated queries in your tests to indicate a model's relevance, then set this variable to ''. |
| 42 | + |
| 43 | +- `model_usage_dbt_query_tag_pattern`: Regular expression string used to find and EXCLUDE queries executed by dbt via the [query_tag](https://docs.getdbt.com/reference/warehouse-profiles/snowflake-profile#query_tag). If this variable is not defined then tagged queries will count as relevant user SELECT statements in the test results. |
| 44 | + |
| 45 | +## Required Permissions |
| 46 | +### BigQuery |
| 47 | +Since the BigQuery implementation of these tests will query from the [INFORMATION_SCHEMA.JOBS](https://cloud.google.com/bigquery/docs/information-schema-jobs) view, the Google Cloud user referenced in your `profiles.yml` must include the IAM permission `bigquery.jobs.listAll`. |
| 48 | + |
| 49 | +### Snowflake |
| 50 | +dbt must have permission to query from the [Account Usage](https://docs.snowflake.com/en/sql-reference/account-usage.html#account-usage-views) and [Information Schema](https://docs.snowflake.com/en/sql-reference/info-schema.html) views. |
| 51 | + |
| 52 | +<br> |
| 53 | + |
| 54 | +# Available Tests |
| 55 | + |
| 56 | +**model_queried_within_last** |
| 57 | + |
| 58 | +Asserts that the target model has been queried by your users within a defined lookback time period, AND that the model was created at some point before the lookback period start time. "User queries" are defined as `SELECT` statements that were not executed by dbt. |
| 59 | + |
| 60 | +*Args*: |
| 61 | +- `num_units` - [REQUIRED] Number of time units to look back from current time. |
| 62 | +- `time_unit` - [REQUIRED] The unit of time used for the lookback period. Can be one of: "day" | "hour" | "minute" | "second". |
| 63 | +- `threshold` - [OPTIONAL] If the model's user query count within the lookback period is below this number, the test fails. Default value is 1. |
| 64 | + |
| 65 | +*Limitations*: |
| 66 | +- **BigQuery**: query jobs are retained for [up to 180 days](https://cloud.google.com/bigquery/docs/information-schema-jobs#data_retention), so setting a lookback period greater than 180 days is not supported. |
| 67 | +- **Snowflake**: account query history is retained for [up to 1 year](https://docs.snowflake.com/en/sql-reference/account-usage.html#account-usage-views), so setting a lookback period greater than 365 days is not supported. |
| 68 | + |
| 69 | +*Usage*: |
| 70 | +```yaml |
| 71 | +# properties.yml |
| 72 | +
|
| 73 | +version: 2 |
| 74 | +
|
| 75 | +models: |
| 76 | + - name: some_model |
| 77 | + tests: |
| 78 | + - dbt_model_usage.model_queried_within_last: |
| 79 | + num_units: 30 |
| 80 | + time_unit: day |
| 81 | + threshold: 1 |
| 82 | +``` |
| 83 | +*^ This example fails if some_model was created at some point earlier than 30 days ago, AND there have been fewer than 1 user queries referencing some_model in the last 30 days.* |
| 84 | + |
| 85 | +<br> |
| 86 | + |
| 87 | +**column_queried_within_last** |
| 88 | + |
| 89 | +Asserts that the column in the target model has been directly referenced in your users' queries within a defined lookback time period, AND that the target model was created at some point before the lookback period start time. "User queries" are defined as `SELECT` statements that were not executed by dbt. |
| 90 | + |
| 91 | +*Args*: |
| 92 | +- `num_units` - [REQUIRED] Number of time units to lookback from current time. |
| 93 | +- `time_unit` - [REQUIRED] The unit of time used for the lookback period. Can be one of: "day" | "hour" | "minute" | "second". |
| 94 | +- `threshold` - [OPTIONAL] If the column's user query count within the lookback period is below this number, the test fails. Default value is 1. |
| 95 | + |
| 96 | +*Limitations*: same as for [model_queried_within_last](#model_queried_within_last) |
| 97 | + |
| 98 | +*Usage*: |
| 99 | +```yaml |
| 100 | +version: 2 |
| 101 | +
|
| 102 | +models: |
| 103 | + - name: some_model |
| 104 | + columns: |
| 105 | + - name: some_column |
| 106 | + tests: |
| 107 | + - dbt_model_usage.column_queried_within_last: |
| 108 | + num_units: 10 |
| 109 | + time_unit: day |
| 110 | + threshold: 1 |
| 111 | +``` |
| 112 | +*^ This example fails if some_model was created at some point earlier than 10 days ago, AND there have been fewer than 1 user queries referencing some_model.some_column in the last 10 days.* |
0 commit comments