Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancing node selection and state selector #6656

Merged
merged 4 commits into from
Dec 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion website/blog/2022-04-14-add-ci-cd-to-bitbucket.md
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,7 @@ Reading the file over, you can see that we:

In summary, anytime anything is pushed to main, we’ll ensure our production database reflects the dbt transformation, and we’ve saved the resulting artifacts to defer to.

> ❓ **What are artifacts and why should I defer to them?** dbt artifacts are metadata of the last run - what models and tests were defined, which ones ran successfully, and which failed. If a future dbt run is set to ***defer*** to this metadata, it means that it can select models and tests to run based on their state, including and especially their difference from the reference metadata. See [Artifacts](https://docs.getdbt.com/reference/artifacts/dbt-artifacts), [Selection methods: “state”](https://docs.getdbt.com/reference/node-selection/methods#the-state-method), and [Caveats to state comparison](https://docs.getdbt.com/reference/node-selection/state-comparison-caveats) for details.
> ❓ **What are artifacts and why should I defer to them?** dbt artifacts are metadata of the last run - what models and tests were defined, which ones ran successfully, and which failed. If a future dbt run is set to ***defer*** to this metadata, it means that it can select models and tests to run based on their state, including and especially their difference from the reference metadata. See [Artifacts](https://docs.getdbt.com/reference/artifacts/dbt-artifacts), [Selection methods: “state”](https://docs.getdbt.com/reference/node-selection/methods#state), and [Caveats to state comparison](https://docs.getdbt.com/reference/node-selection/state-comparison-caveats) for details.

### Slim Continuous Integration: Retrieve the artifacts and do a state-based run

Expand Down
2 changes: 1 addition & 1 deletion website/docs/docs/build/exposures.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,5 +77,5 @@ When we generate the [dbt Explorer site](/docs/collaborate/explore-projects), yo
## Related docs

* [Exposure properties](/reference/exposure-properties)
* [`exposure:` selection method](/reference/node-selection/methods#the-exposure-method)
* [`exposure:` selection method](/reference/node-selection/methods#exposure)
* [Data health tiles](/docs/collaborate/data-tile)
2 changes: 1 addition & 1 deletion website/docs/docs/build/groups.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,4 +119,4 @@ dbt.exceptions.DbtReferenceError: Parsing Error

* [Model Access](/docs/collaborate/govern/model-access#groups)
* [Group configuration](/reference/resource-configs/group)
* [Group selection](/reference/node-selection/methods#the-group-method)
* [Group selection](/reference/node-selection/methods#group)
2 changes: 1 addition & 1 deletion website/docs/docs/collaborate/govern/model-versions.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ Let's say that `dim_customers` has three versions defined: `v2` is the "latest",

As you'll see in the implementation section below, a versioned model can reuse the majority of its YAML properties and configuration. Each version needs to only say how it _differs_ from the shared set of attributes. This gives you, as the producer of a versioned model, the opportunity to highlight the differences across versions—which is otherwise difficult to detect in models with dozens or hundreds of columns—and to clearly track, in one place, all versions of the model which are currently live.

dbt also supports [`version`-based selection](/reference/node-selection/methods#the-version-method). For example, you could define a [default YAML selector](/reference/node-selection/yaml-selectors#default) that avoids running any old model versions in development, even while you continue to run them in production through a sunset and migration period. (You could accomplish something similar by applying `tags` to these models, and cycling through those tags over time.)
dbt also supports [`version`-based selection](/reference/node-selection/methods#version). For example, you could define a [default YAML selector](/reference/node-selection/yaml-selectors#default) that avoids running any old model versions in development, even while you continue to run them in production through a sunset and migration period. (You could accomplish something similar by applying `tags` to these models, and cycling through those tags over time.)

<File name="selectors.yml">

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ Historically, dbt's test coverage was confined to [“data” tests](/docs/build

In v1.8, we're introducing native support for [unit testing](/docs/build/unit-tests). Unit tests validate your SQL modeling logic on a small set of static inputs __before__ you materialize your full model in production. They support a test-driven development approach, improving both the efficiency of developers and the reliability of code.

Starting from v1.8, when you execute the `dbt test` command, it will run both unit and data tests. Use the [`test_type`](/reference/node-selection/methods#the-test_type-method) method to run only unit or data tests:
Starting from v1.8, when you execute the `dbt test` command, it will run both unit and data tests. Use the [`test_type`](/reference/node-selection/methods#test_type) method to run only unit or data tests:

```shell

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ The ability for installed packages to override built-in materializations without

### Quick hits

- [`state:unmodified` and `state:old`](/reference/node-selection/methods#the-state-method) for [MECE](https://en.wikipedia.org/wiki/MECE_principle) stateful selection
- [`state:unmodified` and `state:old`](/reference/node-selection/methods#state) for [MECE](https://en.wikipedia.org/wiki/MECE_principle) stateful selection
- [`invocation_args_dict`](/reference/dbt-jinja-functions/flags#invocation_args_dict) includes full `invocation_command` as string
- [`dbt debug --connection`](/reference/commands/debug) to test just the data platform connection specified in a profile
- [`dbt docs generate --empty-catalog`](/reference/commands/cmd-docs) to skip catalog population while generating docs
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,5 +54,5 @@ GitHub discussion with details: [dbt-labs/dbt-core#6011](https://github.com/dbt-

### Quick hits
- **["Full refresh"](/reference/resource-configs/full_refresh)** flag supports a short name, `-f`.
- **[The "config" selection method](/reference/node-selection/methods#the-config-method)** supports boolean and list config values, in addition to strings.
- **[The "config" selection method](/reference/node-selection/methods#config)** supports boolean and list config values, in addition to strings.
- Two new dbt-Jinja context variables for accessing invocation metadata: [`invocation_args_dict`](/reference/dbt-jinja-functions/flags#invocation_args_dict) and [`dbt_metadata_envs`](/reference/dbt-jinja-functions/env_var#custom-metadata).
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ See GitHub discussion [dbt-labs/dbt-core#5468](https://github.com/dbt-labs/dbt-c
- **[Grants](/reference/resource-configs/grants)** are natively supported in `dbt-core` for the first time. That support extends to all standard materializations, and the most popular adapters. If you already use hooks to apply simple grants, we encourage you to use built-in `grants` to configure your models, seeds, and snapshots instead. This will enable you to [DRY](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself) up your duplicated or boilerplate code.
- **[Metrics](/docs/build/build-metrics-intro)** now support an `expression` type (metrics-on-metrics), as well as a `metric()` function to use when referencing metrics from within models, macros, or `expression`-type metrics. For more information on how to use expression metrics, check out the [**`dbt_metrics` package**](https://github.com/dbt-labs/dbt_metrics)
- **[dbt-Jinja functions](/reference/dbt-jinja-functions)** now include the [`itertools` Python module](/reference/dbt-jinja-functions/modules#itertools), as well as the [set](/reference/dbt-jinja-functions/set) and [zip](/reference/dbt-jinja-functions/zip) functions.
- **[Node selection](/reference/node-selection/syntax)** includes a [file selection method](/reference/node-selection/methods#the-file-method) (`-s model.sql`), and [yaml selector](/reference/node-selection/yaml-selectors) inheritance.
- **[Node selection](/reference/node-selection/syntax)** includes a [file selection method](/reference/node-selection/methods#file) (`-s model.sql`), and [yaml selector](/reference/node-selection/yaml-selectors) inheritance.
- **[Global configs](/reference/global-configs/about-global-configs)** now include CLI flag and environment variable settings for [`target-path`](/reference/global-configs/json-artifacts) and [`log-path`](/reference/global-configs/logs), which can be used to override the values set in `dbt_project.yml`

### Specific adapters
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ Expected a schema version of "https://schemas.getdbt.com/dbt/manifest/v5.json" i

### Advanced and experimental functionality

**Fresh Rebuilds.** There's a new _experimental_ selection method in town: [`source_status:fresher`](/reference/node-selection/methods#the-source_status-method). Much like the `state:` and `result` methods, the goal is to use dbt metadata to run your DAG more efficiently. If dbt has access to previous and current results of `dbt source freshness` (the `sources.json` artifact), dbt can compare them to determine which sources have loaded new data, and select only resources downstream of "fresher" sources. Read more in [Understanding State](/reference/node-selection/syntax#about-node-selection) and [CI/CD in dbt Cloud](/docs/deploy/continuous-integration).
**Fresh Rebuilds.** There's a new _experimental_ selection method in town: [`source_status:fresher`](/reference/node-selection/methods#source_status). Much like the `state:` and `result` methods, the goal is to use dbt metadata to run your DAG more efficiently. If dbt has access to previous and current results of `dbt source freshness` (the `sources.json` artifact), dbt can compare them to determine which sources have loaded new data, and select only resources downstream of "fresher" sources. Read more in [Understanding State](/reference/node-selection/syntax#about-node-selection) and [CI/CD in dbt Cloud](/docs/deploy/continuous-integration).


[**dbt-Jinja functions**](/reference/dbt-jinja-functions) have a new landing page, and two new members:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ BigQuery:
## New and changed documentation

**Core**
- [`path:` selectors](/reference/node-selection/methods#the-path-method)
- [`path:` selectors](/reference/node-selection/methods#path)
- [`--fail-fast` command](/reference/commands/run#failing-fast)
- `as_text` Jinja filter: removed this defunct filter
- [accessing nodes in the `graph` object](/reference/dbt-jinja-functions/graph)
Expand Down
2 changes: 1 addition & 1 deletion website/docs/docs/deploy/ci-jobs.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ For semantic nodes and models that aren't downstream of modified models, dbt Clo

<Expandable alt_header="Semantic nodes that are modified or affected by downstream modified nodes.">

To only validate modified semantic nodes, use the following command (with [state selection](/reference/node-selection/syntax#stateful-selection)):
To only validate modified semantic nodes, use the following command (with [state selection](/reference/node-selection/syntax#state-selection)):

```bash
dbt sl validate --select state:modified+
Expand Down
4 changes: 2 additions & 2 deletions website/docs/guides/set-up-ci.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ Use the **Continuous Integration Job** template, and call the job **CI Check**.
In the Execution Settings, your command will be preset to `dbt build --select state:modified+`. Let's break this down:

- [`dbt build`](/reference/commands/build) runs all nodes (seeds, models, snapshots, tests) at once in DAG order. If something fails, nodes that depend on it will be skipped.
- The [`state:modified+` selector](/reference/node-selection/methods#the-state-method) means that only modified nodes and their children will be run ("Slim CI"). In addition to [not wasting time](https://discourse.getdbt.com/t/how-we-sped-up-our-ci-runs-by-10x-using-slim-ci/2603) building and testing nodes that weren't changed in the first place, this significantly reduces compute costs.
- The [`state:modified+` selector](/reference/node-selection/methods#state) means that only modified nodes and their children will be run ("Slim CI"). In addition to [not wasting time](https://discourse.getdbt.com/t/how-we-sped-up-our-ci-runs-by-10x-using-slim-ci/2603) building and testing nodes that weren't changed in the first place, this significantly reduces compute costs.

To be able to find modified nodes, dbt needs to have something to compare against. dbt Cloud uses the last successful run of any job in your Production environment as its [comparison state](/reference/node-selection/syntax#about-node-selection). As long as you identified your Production environment in Step 2, you won't need to touch this. If you didn't, pick the right environment from the dropdown.

Expand Down Expand Up @@ -344,7 +344,7 @@ Use the **Continuous Integration Job** template, and call the job **QA Check**.
In the Execution Settings, your command will be preset to `dbt build --select state:modified+`. Let's break this down:

- [`dbt build`](/reference/commands/build) runs all nodes (seeds, models, snapshots, tests) at once in DAG order. If something fails, nodes that depend on it will be skipped.
- The [`state:modified+` selector](/reference/node-selection/methods#the-state-method) means that only modified nodes and their children will be run ("Slim CI"). In addition to [not wasting time](https://discourse.getdbt.com/t/how-we-sped-up-our-ci-runs-by-10x-using-slim-ci/2603) building and testing nodes that weren't changed in the first place, this significantly reduces compute costs.
- The [`state:modified+` selector](/reference/node-selection/methods#state) means that only modified nodes and their children will be run ("Slim CI"). In addition to [not wasting time](https://discourse.getdbt.com/t/how-we-sped-up-our-ci-runs-by-10x-using-slim-ci/2603) building and testing nodes that weren't changed in the first place, this significantly reduces compute costs.

To be able to find modified nodes, dbt needs to have something to compare against. Normally, we use the Production environment as the source of truth, but in this case there will be new code merged into `qa` long before it hits the `main` branch and Production environment. Because of this, we'll want to defer the Release environment to itself.

Expand Down
Loading
Loading