Make it possible for DBT to generate models from a piece of JINJA code #5101

bashyroger · 2021-06-17T07:52:00Z

bashyroger
Jun 17, 2021

Describe the feature

While I like DBT a lot, it us currently quite limited in supporting true metadata based 'data warehouse automation'. Sure, a lot of things are automated for you in DBT:

Materializations automate the patterns for creating data emitting artifacts
Tests as yml metadata automate various ways of testing data
The ref statement automates loading dependencies
etc

However, what is lacking is the possibility for DBT to:

generate multiple models, based on a Pseudo-SQL template (pseudo DBT model) and a piece of JINJA code that loops over such a template to spawn not one but multiple models
it should be possible to do this both in DBT CLI and DBT cloud

Describe alternatives you've considered

At my current client we store our raw data in a snowflake variant column, with the schema metadata of that column in a separate variant column.

We do this to keep loading data simple, to never ever be affected by source schema changes anymore in the raw data zone.
To actually use this data however, we have written a piece of python code that parses the stored schema metadata to generate a series of DBT models (materialized as views) on top of the raw data tables.

While this all works fine, this code is obviously disconnected from DBT which currently hinders development as it requires code / environment / context switching.

Additional context

This feature is not database specific and overlaps with Ability to write to files with Jinja #3428
See this slack thread for additional context: https://getdbt.slack.com/archives/C2JRRQDTL/p1623704937141800
When considering adding this kind of functionality, be sure to read thought the content of Roelant Vos, an expert on DWH automation: http://roelantvos.com/blog/interface-for-data-warehouse-automation/
It could also be a good idea to contact someone from this company that solely focus on DWH automation, without pushing a specific tool: https://datatoko.com/

Who will this benefit?

Everyone that wants to do more extensive data warehouse automation with / using DBT.

Are you interested in contributing this feature?

I am not a solid programmer, but definitely would like to contribute in the requirements, use cases and beta test this kind of functionality

jtcohen6 · 2021-07-03T19:33:13Z

jtcohen6
Jul 3, 2021
Maintainer

@bashyroger Thanks for opening! Apologies for the delay in my response. There's a lot here to chew on.

Today, one of the foundational assumptions underlying dbt is that 1 model file = 1 model node = 1 object (view or table) in the database. What's the problem here? Whenever we want a new dbt-managed database object, it requires a PR to add a new model file. That's hardly an automated approach to data warehousing.

I've seen three quite different approaches that try to solve this problem:

Creating multiple database objects from a single model
Defining multiple models (dbt manifest objects) within a single file
Automated code generation to create new model files, as a process that could live inside or outside of dbt

At the risk of writing much too much here, I want to discuss each approach in some detail.

1. One model, many objects

See: #1637, #2551

In the most common use case, someone wants a large model result—same logic, same columns—to be split in the the database across multiple tables, with perhaps a few parametrized differences across them. There may be a good reason (such as PII) that each subset needs to be isolated from other records, with the same column schema, transformed in the exact same manner. We see this sometimes with marketing or consulting agencies who offer modeled data as a service to their clients.

There are parts about this that are tricky for dbt today:

How to ref this multi-relational model? The model could create a thin view that unions the underlying tables together, or the ref may need to be suffixed with a particular shard identifier.
How to combine metadata on multiple objects, for the purposes of the adapter cache and catalog generation (documentation)?

In the meantime, while this functionality isn't natively supported, I've seen valid approaches here go in two different directions:

Tighter combination: Find a comfortable-enough way to keep all the data in the same tables until the very last step, at which point it's split up into one secure/authorized view per client. A database feature offering partition-level permissioning would be huge in addressing privacy concerns.
More separation: Turn the common modeling into a package, imported and parametrized within a separate project for each client.

I wanted to bring up this approach because these have often been part of the same conversation. It's not exactly the issue you describe here, and I don't think it would apply nearly as well to unstructured data that doesn't share a common column schema.

2. One file, many models

This is an old, old issue: #184. There was a long time when we were convinced that model blocks the ultimate way to go. They've fallen out of fashion recently, but this vein of thinking was especially popular around the time that we created snapshots (dbt v0.14.0, summer 2019).

-- models/many_models.sql

{% model stg_stripe_payments %}
    
    select * from {{ source('stripe', 'payments') }}
    where is_deleted = false

{% endmodel %}

{% model net_revenue %}
    
    {{ config(materialized = 'table') }}
    
    select
        orders.order_id,
        payments.amount - orders.cogs as net_revenue
    
    from {{ ref('stg_stripe_payments') }} as payments
    join {{ ref('stg_orders') }} as orders
        on payments.order_id = orders.order_id

{% endmodel %}

There are a few things that model blocks have going for them:

Flexibility in the hands of users, to split across multiple files as they see fit, similar to the flexibility offered by dbt .yml files today
Encouragement to use even-more-modular models, especially ephemeral models, by diminishing the development-time "cost" of switching between multiple files
Support for "local" macros (Inline Macro Support #3379) that are defined at the top of the file and scoped only to the model blocks defined therein

This may still be a direction we head in, ultimately. We won't get there before dbt v1.0, but who's to say it couldn't be an essential component of dbt v2.0?

I bring up model blocks here because the prospect of decoupling 1 model-per-file also suggests the possibility of taking this further, by treating the larger file as a single Jinja template that could produce many model blocks dynamically:

-- models/many_models.sql

{% for source in graph.sources.values() | selectattr('source_name', 'equalto', 'stripe') | list %}
    
    {% model stg_stripe_{{ source.name }} %} {# how would this work? #}
    
        select * from {{ source(source.source_name, source.name) }}
        where is_deleted = false
        
    {% endmodel %}

{% endfor %}

If you try to do this with snapshots today, you'll get an error message:

Got a block definition inside control flow at 3:4. All dbt block definitions must be at the top level

For good reason: This gets really complex to parse, and it makes us even more reliant on Jinja at a time when we're investing in static analysis to speed up parse-time manifest construction.

There's an alternative syntax toyed with in #1374 (comment). That issue was about dynamic columns, to avoid repeating the same yaml properties over and over. (That's ground we've since retrod in #2995, among other places, and definitely interested in thinking about more.) There, Drew mentioned the theoretical idea of a manifest namespace, exposed to the Jinja context, with methods like set_column_schema. Well, why not some manifest-altering method like add_model?

-- models/many_models.sql

{% for source in graph.sources.values() | selectattr('source_name', 'equalto', 'stripe') | list %}
    
    {% set model_sql %}
        select * from {{ source(source_name, name) }}
    {% endset %}
    
    {% set model_name = 'stg_' + source.source_name + '_' + source.name %}
    
    {% do manifest.add_model(name=model_name, sql=model_sql) %}

{% endfor %}

While it's easier to imagine writing code like this, it would still be fiendishly complex to handle at parse time.

This approach quickly runs into a big limitation in how dbt works today, given these two premises:

At parse time, dbt doesn't need to know the exact SQL for every model, but it does need a full accounting of all models that exist, their dependencies, and their configurations.
At parse time, dbt does not run any queries against the database.

So the SQL for each individual model could be dynamically generated, just as it can today, based on the latest content or metadata in the database. But it would not be possible to create more or fewer models on the basis on information stored in the database—exactly the sort of use case that your issue is getting at. Fuller capabilities would require big foundational changes to the way that dbt works.

3. Code generation

After all, maybe dbt cannot be—ought not be—a data warehouse automater itself, but merely the best-possible substrate for "true" DWH automation?

I feel this gets at the heart of the matter. In my early years using dbt, I thought the extensibility and flexibility of its tooling were its strongest features; with Jinja, all things are possible, certainly compared to SQL. Nowadays, I'm inclined to think that dbt's greatest strenght is the rigidity of its opinionated framework, its demand for decisiveness and verbosity in the places that matter. The guardrails are there for a reason.

So, let's say we keep living in a world where one database object has to be one model has to be one file (or at least one Jinja block). An automated process could still be of tremendous help by generating those files; open PRs; run CI checks; require a human reviewer (or not); merge; revert if necessary. In that world, data warehouse automation is a time-lapse photograph: a double-speed verson of the current git-based dbt development workflow, with human intervention at only the appropriate moments. The key difference is that data warehouse automation is decoupled from data warehouse execution—their handoff point is dbt model code, explicitly defined.

What should that automated process look like? Should the premier executor of dbt code also try to be its own code generator? Or should there be a separate tool? There's fair arguments on either side:

dbt could do it all. dbt can run arbitrary SQL and template code; why not template dbt code? Because the processes are decoupled, we avoid entirely the parse-time vs. execute-time dilemma discussed above. The principal example here is the codegen package. There are some gaps in this workflow today, requiring more human facilitation or glue code than we'd ultimately want. In this category fall the desires to write codegen's output to files from within the Jinja context directly (Ability to write to files with Jinja #3428), or to at least have better tooling for piping it to files via shell commands ([CT-172] --quiet flag for run-operations #3451)
A workflow unto itself. An end-to-end code generator, written in a true scripting language (not Jinja), that can fully control a file system, and do anything it needs. Ideally, something like this is hooked up to an automation workflow that also plugs into the git provider. My sense is that this is the piece of python code your colleagues are using today, though as a more-manual process.

Why not both? dbt-helper, written in python, already imports dbt as a python module, using its database connection and catalog generation capabilities as scaffolding. We want dbt-core to be a more-stable, better-documented python library. I'd like to see open source dbt code-generation workflows for GitHub Actions and GitLab CI/CD. Perhaps this could even be a feature in dbt Cloud.

tl;dr

Eventually, I could see us doing pieces of all three of the things mentioned above:

Better support for "template" or "sharded" models that map to multiple databse objects
Support for multiple model blocks in the same file (though this would be more of a cosmetic change than a functional one)
Invest in dbt's own capabilities around code generation, and/or lend our support to promising code generation workflows developed by community members. I know that such workflows already exist, and I'd be excited to see more of them open sourced and mettle-tested.

Ultimately, I find the possibilities offered by the third approach most compelling. Having written 1500 words on dbt and model automation, I'm coming around to the idea that a native dbt task (#1082), or even just a way to plug together codegen + file system + GitHub Actions, could get us a lot of the way there.

0 replies

fabrice-etanchaud · 2021-07-07T09:51:24Z

fabrice-etanchaud
Jul 7, 2021

A masterful play in three acts ! Thank you Jeremy.

0 replies

erika-e · 2021-11-09T16:20:39Z

erika-e
Nov 9, 2021

I'd like to see open source dbt code-generation workflows for GitHub Actions and GitLab CI/CD. Perhaps this could even be a feature in dbt Cloud.

An automated process could still be of tremendous help by generating those files; open PRs; run CI checks; require a human reviewer (or not); merge; revert if necessary. In that world, data warehouse automation is a time-lapse photograph: a double-speed verson of the current git-based dbt development workflow, with human intervention at only the appropriate moments. The key difference is that data warehouse automation is decoupled from data warehouse execution—their handoff point is dbt model code, explicitly defined.

We built workflows like this at Aula and we'll be talking about it at Coalsece 2021! Jeremy's description, above, is an elegant summary of what our automation does. Source schema files, snapshots, and the first staging layer are all auto-generated and auto-maintaining.

The automation relies on the following components:

A yaml file with configuration information for the resources that the automation generates
SQL or yaml templates for resources
A python script that takes arguments for the resources to generate, applies the configurations to the templates, and outputs the appropriate files to the configured locations
A GitHub action that opens a PR when changes are required

I didn't know about dbt-helper until just now, so I'm interested to take a look and see how importing dbt-core directly might help standardize the methods and make the code more usable for others. Assuming my coworker from Aula is ok with it, I'd also love to contribute it somewhere in an open-source way. I see somewhat frequent posts over in the dbt slack from people looking for similar functionality.

0 replies

rsazima · 2022-01-31T04:49:44Z

rsazima
Jan 31, 2022

Great discussion.

If I get it right, approaches 1 and 2 above do not currently work, correct?

Even if we end up with just approach 3, a "template" or reference implementation would be handy.

0 replies

trentkgUjet · 2022-02-23T16:25:41Z

trentkgUjet
Feb 23, 2022

My team would also like this feature!

0 replies

Gwildor · 2022-04-01T14:45:16Z

Gwildor
Apr 1, 2022

I'm curious if the ability from Jinja2 to extend from another template and replace predefined blocks of code (template inheritance) was considered for this feature? Then a user could define something like base_model.sql and then model_1.sql that extends from the base, and replaces the blocks as needed.

2 replies

jtcohen6 May 20, 2022
Maintainer

Using {% extends %} for this is interesting! Thanks for bringing in this idea.

I just wrote a bit about this over in #5247 (comment):

There's one use case that I still find compelling here, and it's the ability to define a "base template" model, which Jinja then {% extends %} into its composite models. That's doable with macros today, but sorta ugly, and you do have to pass in any dynamic values as arguments, rather than just inheriting and overriding them. (Granted, there's plenty of people I work with who will tell you that the former functional programming paradigm is preferable to the latter object-oriented paradigm, every day of the week and twice on some days.) I think the real ergonomic enhancement here doesn't look like {% include %} or {% extends %} at all, so much as the ability to define multiple models in the same file, via {% model %} blocks—alongside macros, even—for better visibility and fewer files.

I guess I'd just say, the real limitation here feels more a matter of the inability to define multiple models in one file, or a macro inline with those models (#5098). When I close my eyes and imagine this, it looks a bit more like this:

-- models/net_revenue_by_country.sql
{% macro net_revenue_country(country, currency) %}

    select
        
        order_id,
        order_date,
        net_revenue as net_revenue_usd,
        {{ convert_currency('net_revenue', from='USD', to=currency) }} as net_revenue_{{ currency | lower }}
    
    from {{ ref('fct_orders') }}
    where country = '{{ country }}'

{% endmacro %}

{% model net_revenue_france %}
    {{ net_revenue_country('FR', 'EUR') }}
{% endmodel %}

{% model net_revenue_malaysia %}
    {{ net_revenue_country('MY', 'MYR') }}
{% endmodel %}

As opposed to something like this (pseudo code):

-- models/net_revenue_by_country.sql
{% model net_revenue_usa %}

    select
        {% block column_list %}
        order_id,
        order_date,
        net_revenue as net_revenue_usd
        {% endblock %}
    
    from {{ ref('fct_orders') }}
    {% block filter %} where country = 'US' {% endblock %}

{% endmacro %}

{% model net_revenue_france %}
    {% extends "net_revenue_usa" %}
    {% block column_list %}
        {{ super() }},
        {{ convert_currency('net_revenue', from='USD', to='EUR') }} as net_revenue_eur
    {% endblock %}
    {% block filter %} where country = 'FR' {% endblock %}
{% endmodel %}

{% model net_revenue_malaysia %}
    {% extends "net_revenue_usa" %}
    {% block column_list %}
        {{ super() }},
        {{ convert_currency('net_revenue', from='USD', to='MYR') }} as net_revenue_myr
    {% endblock %}
    {% block filter %} where country = 'MY' {% endblock %}
{% endmodel %}

In either case, it's the "all in one file" piece that makes the biggest difference for me personally.

xsDoston Nov 9, 2024

Hi @jtcohen6
i am trying to use the same in my code

{% model net_revenue_france %}
{% endmodel %}
what is meaning of this line ?

trentkgUjet · 2022-04-01T14:57:31Z

trentkgUjet
Apr 1, 2022

We would still really love this feature!

0 replies

Brandon-Peebles-Zocdoc · 2022-04-05T17:39:14Z

Brandon-Peebles-Zocdoc
Apr 5, 2022

would love this feature!

0 replies

fabrice-etanchaud · 2022-04-06T08:09:37Z

fabrice-etanchaud
Apr 6, 2022

I'm curious if the ability from Jinja2 to extend from another template and replace predefined blocks of code (template inheritance) was considered for this feature? Then a user could define something like base_model.sql and then model_1.sql that extends from the base, and replaces the blocks as needed.

Hi @Gwildor, there is an old feature request about jinja2 template inheritance : #1337

0 replies

mdlnr · 2022-04-06T08:54:47Z

mdlnr
Apr 6, 2022

My use case would be for a multi-tenant platform where we create an access layer (view) for every tenant. I would like to configure a list of tenant identifiers and generate a view for each tenant using one generic jinja template.

3 replies

felipemellodito May 19, 2022

Same here

joe-mr Feb 3, 2023

same here

felipemellodito Feb 3, 2023

Hey guys, I'm using Dataform now because of this feature. In Dataform it's possible using a foreach in JS

bashyroger · 2022-04-06T11:15:22Z

bashyroger
Apr 6, 2022
Author

Thanks for your extensive comment @jtcohen6 , I have been giving your writings a bit more thoughs
Regarding your comment in act 3:

After all, maybe dbt cannot be—ought not be—a data warehouse automater itself, but merely the best-possible substrate for "true" DWH automation?

I still think it can / should be partially. Partially by indeed introducing 'template models': models with a recursive (Jinja) loop that can spawn the creation of multiple models files.
For us it now feels disjoined that we, just to get this done, have to write our own custom python code that enables us to do this when ALL that is missing is this 1 meta model to n child models step. As in: we have now implemented a solution that is akin to what @erika-e mentions, but it feels over-engineered to us...

From what I can infer about your writing, this initial request would largely be fulfilled if #3428 were implemented.

Regarding you comment on parse time / vs execution time problem: those template models would not have to run at execution time initially. As a developer, I would purely see them as a way to automate the creation of multiple model files from a template, something that the current codegen package indeed cannot do.

Practically, I would expect a command to exists like:
dbt generate -s my_meta_model.sql --args {args}
It would create / update 1-n children, REAL dbt models that after creation obviously would have to be compiled.

Then later, further along the road, I would see them be added to the execution context:

The dbt generate command would be allowed to be used in an execution context, requiring a 2nd compile pass when invoked.
The child models, if all valid would be 'lazily' auto-committed to the same branch the code is running on.

0 replies

moseleyi · 2022-12-29T11:51:41Z

moseleyi
Dec 29, 2022

-- models/net_revenue_by_country.sql
{% macro net_revenue_country(country, currency) %}

    select
        
        order_id,
        order_date,
        net_revenue as net_revenue_usd,
        {{ convert_currency('net_revenue', from='USD', to=currency) }} as net_revenue_{{ currency | lower }}
    
    from {{ ref('fct_orders') }}
    where country = '{{ country }}'

{% endmacro %}

{% model net_revenue_france %}
    {{ net_revenue_country('FR', 'EUR') }}
{% endmodel %}

{% model net_revenue_malaysia %}
    {{ net_revenue_country('MY', 'MYR') }}
{% endmodel %}

This would be amazing!

0 replies

laura-jan · 2023-04-12T05:44:57Z

laura-jan
Apr 12, 2023

Seems like an extremely important feature that is missing in dbt when we are talking about scalability. Discussion started in 202, is there any "movement" towards achieving this? Example of how very easy it is to achieve it within DataForm https://docs.dataform.co/guides/javascript/js-api#example-dynamic-dataset-generation

0 replies

boxysean · 2023-08-31T17:57:35Z

boxysean
Aug 31, 2023
Collaborator

I would like to suggest an alternative solution not covered in @jtcohen6's original response post, let's call it...

4. Native templated models

One pattern I see amongst some dbt developers is a templating step to generate dbt models prior to running a dbt build command. This adds a level of dynanicism to get around the dbt design constraint "one model, one database object", and dbt power users find it useful when their project has many similar dbt models expressed through a handful of macros with shared logic.

The main pain dbt developers experience with the "one model, one database object" constraint is that a new file must be created and maintained for each transformation. When there are 100s (or more) of transformations that can be expressed with a single macro, then it becomes a burden to the developer to add and maintain transformations via the filesystem. It has emerged in the developer base that there is a common desire to manage individual transformations using fewer files (e.g., YAML files), possibly in a tabular format (e.g., spreadsheet, CSV, or database table).

I propose here that dbt adds the capability for native templated models, with the main purpose of eliminating pre-processing templating steps for models that can be expressed as a single macro.

The benefits should be an improved developer experience and streamlined workflow. In essence, dbt developers who use a native templated model would be managing metadata about their transformations with a YAML file. This feature could also unlock some additional functionality for dbt down the line, as a templated model would be considerably more structured than a free-form SQL or python model -- it would be machine-readable.

Existing templating examples (workarounds)

In both cases shown below, developers have the ability to express their transformations in configuration. Then, they run a pre-processing step to generate dbt models. Finally, they run dbt build with the models outputted from the pre-processing step.

Example 1: Data Vault dbt packages

Two example projects in the Data Vault space are turbovault4dbt and dbtvault-generator. Both of these projects allow developers to manage their transformations in CSV files or database tables by specifying parameters to the Hubs, Links, Satellites macros in datavault4dbt and AutomateDV respectively.

Before:

dbtvault:
  models:
    - name: client_hub
      model_type: hub
      dbtvault_arguments:
        src_pk: CUSTOMER_HK
        src_nk: CUSTOMER_ID
        src_extra_columns:
          - one
          - two
        src_ldts: LOAD_DATETIME
        src_source: RECORD_SOURCE
        source_model:
          - "v_stg_orders_web"
          - "v_stg_orders_crm"
          - "v_stg_orders_sap"

After running pre-processing step:

# File: models/hubs/client_hub.sql

{{ config(materialized='incremental') }}

{%- set source_model = ["v_salesforce_accounts"]   -%}

{%- set src_pk = "ACCOUNT_PK_HASH"                -%}
{%- set src_nk = "ACCOUNT_ID"                -%}
{%- set src_ldts = "LOAD_DATETIME"            -%}
{%- set src_source = "RECORD_SOURCE"          -%}

{{ automate_dv.hub(src_pk=src_pk, src_nk=src_nk, src_ldts=src_ldts,
                src_source=src_source, source_model=source_model) }}

Example 2: Custom generic templating

I have also seen presentations and workflows by other users who have built custom scripts to achieve similar results. One such templated model looks as follows (I call this one "yo dawg, I heard you like Jinja").

Before:

--- "Double-templated" SQL

{{'{{}}'}}
    config({{'({'}}
        "materialized": {{materialized}},
        ...
    {{'})'}}
{{'}}'}}

with {{source.name}} as (
    select
        {{'{{ hash_pk([\''}}{{source.key[0]}}{{'\']), \''}}{{pk_name}}{{'\') }}'}},
        ...

# Template application configuration file

- name: my_model
  materialized: incremental
  source:
    key: my_key
    name: my_source
  pk_name: my_pk
- name: my_second_model
  materialized: incremental
  source:
    key: my_key
    name: my_source
  pk_name: my_pk
...

After running pre-processing step:

# File: models/my_model.sql

{{
    config({
      "materialized": "incremental",
      ...
    })
}}

with my_source as (
    select
        {{ hash_pk('my_key', 'my_pk'),
    ...

Proposed solution

Allow developers to add dbt models via YAML configuration. Then, with these defined in YAML, two additional models would appear in the DAG as though they were separate SQL files. This would not require explicit generation of physical SQL files prior to running dbt -- they would be interpolated during dbt runtime.

templated_model:

  - name: hub_account
    macro: automate_dv.hub
    params:
      src_pk: "account_pk_hash"
      src_nk: "accountid"
      src_ldts: "datecreated"
      src_source: "record_source"
    config:
      materialized: "incremental"

  - name: link_contact_account
    macro: automate_dv.link
    params:
      src_pk: "contact_pk_hash"
      src_fk:
        - "account_pk_hash"
      src_ldts: "datecreated"
      src_source: "record_source"
    config:
      materialized: "incremental"

This does not allow for complete dynanicism of model generation. Notably, no programming constructs like for-loops and if-statements. But I believe this would suffice for many use-cases of "generated models", particularly in the Data Vault space.

Related art

A good analog to to this problem comes from Wherescape, where metadata about transformations are represented as records in a Wherescape application database in the userspace.

8 replies

mbtsvetkov Sep 4, 2023

but the above is still 100% templating stuff, isn't it. Well i really asked myself "can i really generate models in dbt using jinja without relaying 100% on the model compilation and what is the code in the model?"

Well, using run_query() actually you can do much more thatn just using simple templating language!
i had a case where before actual model script execution i had to perform a lot of pre_hook sql statements using dynamic sql that i'm building in my macro.
for each of the sql code i've used run_query() that actually will do whatever you want - creating new object, dropping the object, inserting data into the object. So basically when you do model run all these commands will be executed by the macro and this has nothing to do with the code in the dbt model itself. I found this really powerful because that way it makes jinja much more than just a templating language.

For example this piece of code is executing INSERT statement when you call the macro (of course controlled by paremeter):

{%- set technical_fields = 'SOURCE,PK,LOAD_DTS,EFFECTIVE_FROM,DW_CREATED_AT, DW_LAST_MODIFIED_AT,DW_IS_DELETED,HK, HD, LAG_HD'-%}
{%- set INSERT_script = 'INSERT INTO ' ~ target.database ~ '.' ~ generate_schema_name(schema,this) ~ '.' ~ create_table_name ~ ' SELECT * EXCLUDE(' ~ technical_fields ~ '), ' ~ NULL_result_string ~ ', ' ~ technical_fields ~ ' FROM ' ~ target.database ~ '.' ~ generate_schema_name(schema,this) ~ '.' ~ create_table_name ~ '_TMP' -%}
{%- set INSERT_result = run_query(INSERT_script) -%}
This now seems much more like Stored procedure rather than just generating text code that you want dbt engine to execute on runtime.

Hope that helps and covers some of the scenarios i've been looking at above in this correspondence.

data-modeler Sep 13, 2023

As a user of automate_dv, I like this. It doesn’t solve any problem that I thought I had.
I also read the issue #4776 and I think the mention that there is a difference between writing yaml and writing a string that then gets converted to yaml was a good point worth adding to your justification.

For me, it’s a nice to have that will:

result in a cleaner repo of models
likely circumvent some would-be problems at some point -
and, I like the idea of being able to completely configure the model in one place — the yml

If this were implemented, I would probably use it in conjunction with my own macros as well as (obviously) the automate_dv macros.

tkirschke Nov 2, 2023

I completely agree with @boxysean and @DVAlexHiggs that this proposed solution adds tons of improvement regarding automation and standardization.

As a datavault4dbt developer, I see a lot of projects where there are thousands of highly standardized Data Vault 2.0 entities all based on one of 10-15 different standardized macros.

boxysean Nov 16, 2023
Collaborator

@tglunde made a good point to me today that, changing the logic of a macro could have wide unintended consequences with native templated models. For example, if a user upgraded a package and used templated models from that package extensively, then all of their templated models would have different underlying code generating the resultant tables.

boxysean Nov 20, 2023
Collaborator

I also received a piece of feedback that this proposal doesn't include thinking around tests and snapshots.

jaanli · 2024-02-10T17:08:52Z

jaanli
Feb 10, 2024

This would be great!

0 replies

betinamehl · 2024-05-30T22:04:21Z

betinamehl
May 30, 2024

my team is moving from dataform to dbt because it seemed to be a much more powerful tool. I'm trying to migrate a model from dataform and just can't believed that i won't be able to do that because dbt doesn't have this feature yet :(

when this type of feature would be available??

0 replies

Make it possible for DBT to generate models from a piece of JINJA code #5101

Describe the feature

Describe alternatives you've considered

Additional context

Who will this benefit?

Are you interested in contributing this feature?

Replies: 16 comments · 13 replies

jtcohen6 Jul 3, 2021 Maintainer

1. One model, many objects

2. One file, many models

3. Code generation

tl;dr

jtcohen6 May 20, 2022 Maintainer

bashyroger Apr 6, 2022 Author

boxysean Aug 31, 2023 Collaborator

4. Native templated models

Existing templating examples (workarounds)

Example 1: Data Vault dbt packages

Example 2: Custom generic templating

Proposed solution

Related art

boxysean Nov 16, 2023 Collaborator

boxysean Nov 20, 2023 Collaborator

Replies: 16 comments 13 replies

jtcohen6
Jul 3, 2021
Maintainer

jtcohen6 May 20, 2022
Maintainer

bashyroger
Apr 6, 2022
Author

boxysean
Aug 31, 2023
Collaborator

boxysean Nov 16, 2023
Collaborator

boxysean Nov 20, 2023
Collaborator