Skip to content

Commit 2307c5d

Browse files
committed
v0.1.0
0 parents  commit 2307c5d

28 files changed

+873
-0
lines changed

.circleci/config.yml

+57
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
version: 2.1
2+
3+
jobs:
4+
5+
integration-tests:
6+
docker:
7+
- image: cimg/python:3.9.9
8+
9+
resource_class: small
10+
11+
environment:
12+
DBT_PROFILES_DIR: ./integration_tests/ci
13+
DBT_PROJECT_DIR: ./integration_tests
14+
BIGQUERY_SERVICE_KEY_PATH: "/home/circleci/bigquery-service-key.json"
15+
GOOGLE_APPLICATION_CREDENTIALS: "/home/circleci/bigquery-service-key.json"
16+
17+
steps:
18+
- checkout
19+
- run:
20+
name: Set dbt Profile
21+
command: cp $DBT_PROJECT_DIR/ci/sample.profiles.yml $DBT_PROJECT_DIR/ci/profiles.yml
22+
- run:
23+
name: Install Python packages
24+
command: |
25+
python3 -m venv venv
26+
. venv/bin/activate
27+
pip install -U pip setuptools wheel
28+
pip install -r dev-requirements.txt
29+
- run:
30+
name: Install dbt dependencies
31+
command: |
32+
. venv/bin/activate
33+
dbt deps --project-dir $DBT_PROJECT_DIR
34+
- run:
35+
name: "BigQuery - GCP credentials"
36+
command: |
37+
echo $BIGQUERY_SERVICE_KEY > $BIGQUERY_SERVICE_KEY_PATH
38+
- run:
39+
name: "BigQuery Tests"
40+
command: |
41+
. venv/bin/activate
42+
. scripts/run_tests.sh bigquery
43+
- run:
44+
name: "Snowflake Tests"
45+
command: |
46+
. venv/bin/activate
47+
. scripts/run_tests.sh snowflake
48+
49+
workflows:
50+
version: 2
51+
test-all:
52+
jobs:
53+
- hold:
54+
type: approval
55+
- integration-tests:
56+
requires:
57+
- hold

.env.sample

+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
#! /bin/sh
2+
DBT_PROFILES_DIR="./integration_tests/ci"
3+
DBT_PROJECT_DIR="./integration_tests"
4+
GOOGLE_APPLICATION_CREDENTIALS= # path to a GCP service account used to auth dbt
5+
BIGQUERY_SERVICE_KEY_PATH= # should be same as above value
6+
BIGQUERY_TEST_DATABASE= # Your BigQuery Project ID
7+
SNOWFLAKE_ACCOUNT_ID= # Your Snowflake Account ID
8+
SNOWFLAKE_USERNAME= # Snowflake user used to auth dbt
9+
SNOWFLAKE_PASSWORD= # Snowflake password used to auth dbt
10+
SNOWFLAKE_ROLE="ACCOUNTADMIN"
11+
SNOWFLAKE_WAREHOUSE= # Your Snowflake warehouse name
12+
SNOWFLAKE_TEST_DATABASE= # Your Snowflake db name
13+
TEST_SCHEMA= # Schema where integration test models are written

.github/CODEOWNERS

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
* @rjh336

.github/issue_template.md

+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
## Steps to reproduce:
2+
<!---
3+
List the steps that took you on your journey to discovering this bug! Include
4+
hyperlinks so we can go on the same journey.
5+
-->
6+
7+
## Expected resuts:
8+
<!---
9+
Explain what you expected to see when you went on your journey of bug-discovery.
10+
-->
11+
12+
## Actual results
13+
<!---
14+
Explain what you actually saw. Include log output and screenshots if available.
15+
-->
16+
17+
## Extra details
18+
<!---
19+
Include any extra details that you think might be relevant.
20+
-->

.github/pull_request_template.md

+23
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
resolves #
2+
3+
This is a:
4+
- [ ] documentation update
5+
- [ ] bug fix with no breaking changes
6+
- [ ] new functionality
7+
- [ ] a breaking change
8+
9+
All pull requests from community contributors should target the `main` branch (default).
10+
11+
## Description & motivation
12+
<!---
13+
Describe your changes, and why you're making them.
14+
-->
15+
16+
## Checklist
17+
- [ ] I have verified that these changes work locally on the following warehouses (Note: it's okay if you do not have access to all warehouses, this helps us understand what has been covered)
18+
- [ ] BigQuery
19+
- [ ] Postgres
20+
- [ ] Redshift
21+
- [ ] Snowflake
22+
- [ ] I have updated the README.md (if applicable)
23+
- [ ] I have added tests & descriptions to my models (and macros if applicable)

.gitignore

+12
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
.env/
2+
.pytest_cache/
3+
__pycache__/
4+
target/
5+
dbt_modules/
6+
dbt_packages/
7+
logs/
8+
venv/
9+
profiles.yml
10+
.python-version
11+
.DS_Store
12+
.env

README.md

+112
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
# dbt Model Usage
2+
This [dbt](https://docs.getdbt.com/) package provides tests to let you know whether your models are still relevant to your users. These tests scan your database's query logs to check if users are still SELECTing from the tables dbt produces. Test failures can let you know when it might be time to retire unused models.
3+
4+
<br>
5+
6+
# Database Support
7+
This package currently supports Google BigQuery and Snowflake.
8+
9+
<br>
10+
11+
# Installation
12+
1. Add this package to your project's `packages.yml`
13+
```yaml
14+
packages:
15+
- git: rjh336/dbt-model-usage
16+
version: 0.1.0
17+
```
18+
2. Update dependencies in your project
19+
```bash
20+
$ dbt deps
21+
```
22+
23+
<br>
24+
25+
# Setup
26+
27+
## In your project
28+
You can configure this package via the [vars](https://docs.getdbt.com/docs/building-a-dbt-project/building-models/using-variables) config in your `dbt_project.yml`
29+
30+
```yaml
31+
# dbt_project.yml
32+
33+
vars:
34+
# BIGQUERY ONLY:
35+
model_usage_dbt_query_comment_pattern: 'my-custom-query-comment-regex'
36+
37+
# SNOWFLAKE ONLY:
38+
model_usage_dbt_query_tag_pattern: 'my-custom-query-tag-regex'
39+
```
40+
41+
- `model_usage_dbt_query_comment_pattern`: Regular expression string used to find and EXCLUDE queries executed by dbt via the [query-comment](https://docs.getdbt.com/reference/project-configs/query-comment). Normally we would not consider these queries as 'user' queries since they might run every time models are built (e.g. tests and hooks). By default, this value is set to **`^\/\*\s+\{"app"\:\s+"dbt".*`**. If your project uses a custom query-comment you might want to use your own pattern. If you prefer to count dbt-generated queries in your tests to indicate a model's relevance, then set this variable to ''.
42+
43+
- `model_usage_dbt_query_tag_pattern`: Regular expression string used to find and EXCLUDE queries executed by dbt via the [query_tag](https://docs.getdbt.com/reference/warehouse-profiles/snowflake-profile#query_tag). If this variable is not defined then tagged queries will count as relevant user SELECT statements in the test results.
44+
45+
## Required Permissions
46+
### BigQuery
47+
Since the BigQuery implementation of these tests will query from the [INFORMATION_SCHEMA.JOBS](https://cloud.google.com/bigquery/docs/information-schema-jobs) view, the Google Cloud user referenced in your `profiles.yml` must include the IAM permission `bigquery.jobs.listAll`.
48+
49+
### Snowflake
50+
dbt must have permission to query from the [Account Usage](https://docs.snowflake.com/en/sql-reference/account-usage.html#account-usage-views) and [Information Schema](https://docs.snowflake.com/en/sql-reference/info-schema.html) views.
51+
52+
<br>
53+
54+
# Available Tests
55+
56+
**model_queried_within_last**
57+
58+
Asserts that the target model has been queried by your users within a defined lookback time period, AND that the model was created at some point before the lookback period start time. "User queries" are defined as `SELECT` statements that were not executed by dbt.
59+
60+
*Args*:
61+
- `num_units` - [REQUIRED] Number of time units to look back from current time.
62+
- `time_unit` - [REQUIRED] The unit of time used for the lookback period. Can be one of: "day" | "hour" | "minute" | "second".
63+
- `threshold` - [OPTIONAL] If the model's user query count within the lookback period is below this number, the test fails. Default value is 1.
64+
65+
*Limitations*:
66+
- **BigQuery**: query jobs are retained for [up to 180 days](https://cloud.google.com/bigquery/docs/information-schema-jobs#data_retention), so setting a lookback period greater than 180 days is not supported.
67+
- **Snowflake**: account query history is retained for [up to 1 year](https://docs.snowflake.com/en/sql-reference/account-usage.html#account-usage-views), so setting a lookback period greater than 365 days is not supported.
68+
69+
*Usage*:
70+
```yaml
71+
# properties.yml
72+
73+
version: 2
74+
75+
models:
76+
- name: some_model
77+
tests:
78+
- dbt_model_usage.model_queried_within_last:
79+
num_units: 30
80+
time_unit: day
81+
threshold: 1
82+
```
83+
*^ This example fails if some_model was created at some point earlier than 30 days ago, AND there have been fewer than 1 user queries referencing some_model in the last 30 days.*
84+
85+
<br>
86+
87+
**column_queried_within_last**
88+
89+
Asserts that the column in the target model has been directly referenced in your users' queries within a defined lookback time period, AND that the target model was created at some point before the lookback period start time. "User queries" are defined as `SELECT` statements that were not executed by dbt.
90+
91+
*Args*:
92+
- `num_units` - [REQUIRED] Number of time units to lookback from current time.
93+
- `time_unit` - [REQUIRED] The unit of time used for the lookback period. Can be one of: "day" | "hour" | "minute" | "second".
94+
- `threshold` - [OPTIONAL] If the column's user query count within the lookback period is below this number, the test fails. Default value is 1.
95+
96+
*Limitations*: same as for [model_queried_within_last](#model_queried_within_last)
97+
98+
*Usage*:
99+
```yaml
100+
version: 2
101+
102+
models:
103+
- name: some_model
104+
columns:
105+
- name: some_column
106+
tests:
107+
- dbt_model_usage.column_queried_within_last:
108+
num_units: 10
109+
time_unit: day
110+
threshold: 1
111+
```
112+
*^ This example fails if some_model was created at some point earlier than 10 days ago, AND there have been fewer than 1 user queries referencing some_model.some_column in the last 10 days.*

dbt_project.yml

+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
name: 'dbt_model_usage'
2+
version: '0.1.0'
3+
4+
config-version: 2
5+
require-dbt-version: [">=1.0.0", "<2.0.0"]
6+
7+
model-paths: ["models"]
8+
analysis-paths: ["analysis"]
9+
test-paths: ["tests"]
10+
seed-paths: ["data"]
11+
macro-paths: ["macros"]
12+
log-path: "logs"
13+
target-path: "target"
14+
clean-targets: ["target", "dbt_modules", "dbt_packages"]

dev-requirements.txt

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
dbt-core
2+
dbt-postgres
3+
dbt-bigquery
4+
dbt-snowflake
5+
pytest
6+
google-cloud-bigquery
7+
snowflake-connector-python==2.7.9

integration_tests/README.md

+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# dbt Model Usage - Integration Tests
2+
3+
## Setup
4+
5+
From the [project root directory](..):
6+
7+
1. Set/export the environment variables in [.env.sample](.env.sample)
8+
9+
2. Create a `profiles.yml`
10+
```sh
11+
cp integration_tests/ci/sample.profiles.yml integration_tests/ci/profiles.yml
12+
```
13+
14+
3. Create development environment and install dependencies
15+
```sh
16+
python3 -m venv venv
17+
. venv/bin/activate
18+
pip install -U pip setuptools wheel
19+
pip install -r dev-requirements.txt
20+
dbt deps --project-dir $DBT_PROJECT_DIR
21+
```
22+
23+
4. Verify that the `integration_tests/dbt_packages/dbt_model_usage` directory was created.
24+
25+
5. Execute tests for a given dbt target:
26+
```sh
27+
$SHELL -e scripts/run_tests.sh [target-name]
28+
```
+32
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
config:
2+
send_anonymous_usage_stats: False
3+
use_colors: True
4+
5+
integration_tests:
6+
outputs:
7+
8+
bigquery:
9+
type: bigquery
10+
method: service-account
11+
keyfile: "{{ env_var('BIGQUERY_SERVICE_KEY_PATH') }}"
12+
project: "{{ env_var('BIGQUERY_TEST_DATABASE') }}"
13+
schema: "{{ env_var('TEST_SCHEMA') }}"
14+
location: us
15+
threads: 4
16+
17+
snowflake:
18+
type: snowflake
19+
account: "{{ env_var('SNOWFLAKE_ACCOUNT_ID') }}"
20+
user: "{{ env_var('SNOWFLAKE_USERNAME') }}"
21+
password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
22+
role: "{{ env_var('SNOWFLAKE_ROLE') }}"
23+
warehouse: "{{ env_var('SNOWFLAKE_WAREHOUSE') }}"
24+
database: "{{ env_var('SNOWFLAKE_TEST_DATABASE') }}"
25+
schema: "{{ env_var('TEST_SCHEMA') }}"
26+
threads: 4
27+
client_session_keep_alive: False
28+
query_tag: dbt_integration_tests
29+
connect_retries: 0
30+
connect_timeout: 10
31+
retry_on_database_errors: False
32+
retry_all: False

integration_tests/data/seed_table.csv

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
string_field,int_field,float_field
2+
a,1,1.0
3+
b,2,1.1
4+
c,3,1.2
5+
d,4,1.3
6+
e,5,1.4
7+
f,6,1.5
8+
g,7,1.6

integration_tests/dbt_project.yml

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
name: 'dbt_model_usage_integration_tests'
2+
version: '1.0'
3+
4+
profile: 'integration_tests'
5+
6+
config-version: 2
7+
8+
model-paths: ["models"]
9+
analysis-paths: ["analysis"]
10+
test-paths: ["tests"]
11+
seed-paths: ["data"]
12+
macro-paths: ["macros"]
13+
target-path: "target"
14+
clean-targets: ["target", "dbt_modules", "dbt_packages"]
15+
16+
vars:
17+
model_usage_dbt_query_comment_pattern: '^\/\*\s+\{"app"\:\s+"dbt".*'
18+
model_usage_dbt_query_tag_pattern: dbt_integration_tests

0 commit comments

Comments
 (0)