[ADAP-432] [Feature] Snowflake Adapter supports INSERT OVERWRITE #736

markproctor1 · 2023-04-04T15:37:34Z

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt-snowflake functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Snowflake has an optional parameter for inserts which does TRUNCATE behavior before INSERT. It's one transaction and avoids dropping or recreating tables.

https://docs.snowflake.com/en/sql-reference/sql/insert#optional-parameters
INSERT [ OVERWRITE ] INTO <target_table>

Support this feature in dbt

Describe alternatives you've considered

There is no single-transaction alternative feature to TRUNCATE / DELETE * and then INSERT. This is documented by Snowflake.

Who will this benefit?

Anyone who needs to avoid DROP/RECREATE and at the same time wants to full load will benefit, with the added bonus of keeping the behavior single-transaction.

In our use case, we don't want tables to be dropped as the table has permissions and policies which need to be kept in place despite which process is doing the load. At the same time, the rows of the table are completely replaced during load.

Are you interested in contributing this feature?

Yes, I need to understand materializations better but in my naivete this seems like a simple and valuable feature

Anything else?

Matt Winkler already built this:
(https://github.com/matt-winkler/dbt_workspace/blob/master/dbt/macros/materializations/snowflake_custom_incremental/insert_overwrite.sql)](https://github.com/matt-winkler/dbt_workspace/tree/master/dbt/macros/materializations/snowflake_custom_incremental)

The text was updated successfully, but these errors were encountered:

Fleid · 2023-04-05T12:00:23Z

Hey there!

I'm still trying to understand the scope of work here.

Is the ask to add a model config applicable to table and incremental only, to indicate that CREATE OR REPLACE should be avoided in favor of INSERT OVERWRITE?

This would mean that normal runs for table, and full-refresh runs for both table and incremental would be impacted?
Is that a fair description of the requirements?

adammarples · 2023-04-06T11:20:25Z

My understanding is that this would be best as a new incremental strategy

incremental_strategy: insert+overwrite

+1 for this feature!

Fleid · 2023-04-06T13:06:23Z

Hi @adammarples - can you please tell me more?

The adapters where we support an insert_overwrite strategy for incremental, we do because the overwrite is done at the partition level. We only drop and regenerate the partitions where data has actually change.
In Snowflake this is not possible, as the entire table is truncated with INSERT OVERWRITE. So to me it's not incremental anymore? Am I missing something?

markproctor1 · 2023-04-06T13:48:00Z

@Fleid It would be better named if it weren't listed as incremental, but some of the other behaviors of table materialization such as potentially dropping / replacing the table are to be avoided for INSERT OVERWRITE.

Fleid · 2023-04-06T14:09:40Z

@markproctor1 I think we agree here. There is value, but not limited to incremental models. So it's not an incremental strategy, but rather an overall switch to make all drop/replace operations replaced by INSERT OVERWRITE.

markproctor1 · 2023-04-06T15:00:17Z

@Fleid What goes into the decision to make it a switch rather than an optional part of table materialization? Someone out there probably LOVES the fact that their tables get dropped :)

Fleid · 2023-04-06T16:00:11Z

I think I covered the fact that it was not a strategy for incremental models.

Could it be a materialization by itself? Materializations are about the type of object being created in the database. We have views, tables, CTEs (ephemerals), and soon materialized views. The only exception is incremental models, which do generate tables, but needed to be clearly delineated from tables as the handling mechanism is widely different. This is me post-rationalizing, because I wasn't there for that decision, but I'm glad it's not a switch on tables.

Is using INSERT OVERWRITE deserving of its own materialization? It's still resulting in a table, and the ergonomics of using it, from the dbt user standpoint is minimal. Yes I want it, or no. That's why I'm seeing it as an option on table and incremental.

markproctor1 · 2023-04-06T16:23:00Z

This is intuitive to me:
{{ config(materialized='table', insert_overwrite=true) }}

However, "insert overwrite" as a concept can be full load or incremental:

https://docs.getdbt.com/reference/resource-configs/glue-configs#the-insert_overwrite-strategy

If no partition_by is specified, then the insert_overwrite strategy will atomically replace all contents of the table, overriding all existing data with only the new records. The column schema of the table remains the same, however. This can be desirable in some limited circumstances, since it minimizes downtime while the table contents are overwritten. The operation is comparable to running truncate + insert on other databases. For atomic replacement of Delta-formatted tables, use the table materialization (which runs create or replace) instead.

So there's concept disagreement, the term can mean both table and incremental outside of dbt. In Snowflake, it means materialized=table.

Fleid · 2023-04-07T13:23:27Z

I agree! I don't think we should name it insert_overwrite, despite the DDL being INSERT OVERWRITE

Maybe something like:

{{ config(materialized='table', overwrite=true) }}

After more thinking, maybe we should narrow it down to only non --full-refresh runs for table:

It's not applicable to non --full-refresh runs for incremental models because it truncates the whole table, which defeats the purpose of incremental processing.
It's not applicable to --full-refresh runs of tables and incremental models, because the expectation of --full-refresh is that tables will be dropped and re-created. Wiping things clean are the point here.
It's not applicable to views and ephemeral models (CTE) for obvious reasons

Now what we need to figure out, is what happens when the model has changed, and is now different from the table in the database?

What happens if I dbt run with this model:

{{ config(materialized='table', overwrite=true) }}
select 1 as c1, `two` as c2

When in database the table looks like:

c1 (number)	c2(number)
1	2

Here the INSERT OVERWRITE will fail on c2 with a type mismatch.
Should we just let that happen? Or should we be clever and somehow re-use on_schema_change from incremental?

{{ config(materialized='table', overwrite=true, on_schema_change= 'sync_all_columns') }}
select 1 as c1, `two` as c2

I'm thinking we should not be clever. This is supposed to be a simple optimization, not a complicated piece of logic. I'm interested in your opinion?

markproctor1 · 2023-04-07T13:37:47Z

Totally agree, no need to be clever here.

In our case we'd actually prefer to know if the table schemas don't line up, fast failure rather than handling the schema change. We tested this in the custom materialization to make sure that dbt doesn't modify the schema on mismatch.

{{ config(materialized='table', overwrite=true) }}
Your way looks better to me, same concept with fewer characters.

Overwrite being true is binary, there is no consideration of partitions, where partitioning might be considered elsewhere. So you get the idea of changing ALL results, without changing DDL.

It's an overwrite, not a replace. I like it.

Fleid · 2023-04-07T13:41:40Z

Since you already have some code there, would you be interested in contributing a PR?
I'll flag this issue as help_wanted if you do ;)

adammarples · 2023-04-08T17:42:28Z

@Fleid yes my mistake, it would be a table materialization, not an incremental strategy

markproctor1 · 2023-04-10T15:50:28Z

@Fleid I'm trying to decide when to fit this in, as sprint work or on the side. I am interested, will get back to you asap.

jtcohen6 · 2023-04-12T08:51:42Z

This is probably how seeds should work, on Snowflake, instead of truncate + delete. It would be a way of resolving #740 / dbt-labs/dbt-snowflake#135.

This makes less sense to me for incremental models (since it always replaces the full contents, unless partition-based insert overwrite on other platforms) and for table models (since it wouldn't work if the columns of the table have changed between runs).

markproctor1 · 2023-04-13T16:34:42Z

@jtcohen6 Your comment makes sense about seeds, but I don't know enough to say whether it should be in scope here.

Regarding table models: Do you expect materialized='table', overwrite=true to succeed if the columns have since changed? In our use case, if we want to INSERT OVERWRITE we don't expect column changes. If we expected column changes and wanted full load behavior, we'd aim for create/replace or something else.

What is the dbtonic of resolving target table changes not handled by the specified materialization? If it were incremental and the target object is no longer compatible it would just error right?

Fleid · 2023-04-17T13:53:31Z

Like discussed, I would not be clever. If we limit the scope of overwrite to simple runs for tables, then INSERT OVERWRITE is an optimization that you should only use when the model is stable. If you get an error from Snowflake because your model changed, you need to go --full-refresh to resolve it. That sounds fair to me.

markproctor1 · 2023-08-17T14:05:17Z

@Fleid Our teams are discussing this again. We're finding the benefit of INSERT OVERWRITE in our custom materialization isn't worth the extra dev/maintenance time through table schema changes. So as you say, it would be best to have --full-refresh.

@dataders I'm not sure this is incremental, in the sense of incremental materialization, as we've discussed it.

dataders · 2023-08-17T17:46:00Z

We're finding the benefit of INSERT OVERWRITE in our custom materialization isn't worth the extra dev/maintenance time through table schema changes. So as you say, it would be best to have --full-refresh.

@markproctor1 my read here is that you and your team no longer think that inclusion of Snowflake's INSERT OVERWRITE within dbt would be a useful addition?

markproctor1 · 2023-08-17T17:50:06Z

@dataders No, I was unclear! We want it more than ever but think it should be handled as a table materialization instead of an incremental materialization.

dataders · 2023-08-17T17:58:21Z

@markproctor1 do you care to open a PR that modifies the existing table materialization? I think that'd make this discussion much more focused.

umarh-ccy · 2024-06-27T14:59:44Z

Hello, is this still in progress or are there any plans for this? key feature here is using the underlying time travel feature which gets lost when we drop/replace tables.

james-mcp-cdi · 2024-09-05T19:27:11Z

Checking on the status of this. Did the INSERT OVERWITE concept get moved to a different PR?
We want to retain time travel on our mart-level objects.

markproctor1 · 2024-09-05T23:58:20Z

I don't know whether it moved to a different PR; I ran into some design decisions over the months.

Snowflake Insert Overwrite

It doesn't cleanly fit into either full load or incremental
- The rows are full load, but there shouldn't be drop / replace table behavior
The terminology for "insert overwrite" in Snowflake is not universal, it won't match other adapter terminology in dbt
- Doesn't match BigQuery for example
When the table column datatype don't match between dbt definition and existing target, how to handle the change is complex
(One reason we use INSERT OVERWRITE is so that columns and tables aren't dropped.)

We still use Matt Winkler's custom materialization for all our most important work, but I don't know how to design this.
@dataders @matt-winkler @james-mcp-cdi

ian-andriot-cdi · 2024-09-11T14:23:41Z

Could this be an incremental strategy within the incremental materialization instead of an option in table? Might be a simple way to implement this? There's even schema change logic already a part of the incremental materialization.

Edit: sorry I saw this was discussed somewhat. In terms of it needing to be a table or incremental. What's the difference between it as a new incremental strategy versus a table option with full refresh set to false?

sanjayan1995 · 2024-11-18T12:07:40Z

Is this issue still open ? Do we have a workaround for INSERT OVERWRITE in dbt

JHYL0406 · 2024-12-04T02:03:15Z

I also support this change whichever way it's implemented
We have tables in snowflake with tags on the columns for data masking policies
We are currently using custom materialisation to truncate the table and then refreshing the data into the table so that the tags are never removed
if we use normal table materialisation then it loses the tags and the masking is removed until the tags are reapplied again

thomas-lochner · 2025-01-23T18:40:38Z

@markproctor1 Would you be willing to share the custom materialization you are using? I'd also like to offer an alternative opinion on the on_schema_change issue as well. For us, we have very few incremental models and largely utilize the table materialization. The main benefit in this change for me is the ability to use Snowflake time travel on these models/tables. We are not changing our model schemas very often, so I would personally like to see the option to specify something like on_schema_change='full_refresh' (does not have to be exactly this config, this example is just to portray the idea), where the table is simply re-created in the instance of a schema change.

I would love to help with this, however I am struggling a bit to understand the code to be honest.

EDIT: I now realize that on_schema_change='sync_all_columns' would be better suited, which would avoid dropping the table all together. I hate to complicate the discussion, but I personally feel the on_schema_change config options would be beneficial, and thus think an incremental strategy would be preferable. I do see this was already discussed, but wanted to give my input.

thomas-lochner · 2025-01-24T15:49:27Z

Apologies for the multiple comments, but I also wanted to add that using the append strategy with on_schema_change='sync_all_columns' gets us (personally) almost to what we want, just lacking that this does insert into as opposed to insert overwrite into.

The link to Matt Winkler's GitHub (in the original post) seems to be exactly what I am looking for actually.

amychen1776 · 2025-01-28T20:39:11Z

Hi folks - I wanted to provide an update on this feature request. Our Adapters engineering team will be picking up this issue soon and you should see an update once we have a more confirmed timeline. If you have any questions, feel free to tag me to discuss :)

amychen1776 · 2025-02-06T20:23:28Z

As an update - we have picked up this issue and are actively working on it.
One change to the design that we have made is that we will implement this as an incremental materialization strategy. First, I do want to say I appreciate this lengthy conversation above; it was helpful to dig in with all the tradeoffs and explanations. Seriously - I stupidly love this community and all the smart humans in it.
Ultimately, the deciding factors were user experience and ease of maintenance. dbt has a mental model that a table materialization drops the table at every run unless a user overrides it (edge cases), and as part of that, we have a lot of logic written about dropping. Incremental materializations do not drop the table in subsequent runs; this is the expected outcome for this feature. This is a paradigm that our users are comfortable with and understand despite the difference in DDL that is occurring with this Snowflake feature. To make it easier to onboard to and know when to use this configuration, we will make it an incremental materialization strategy (which knowing a lot of the incremental configurations will not be applicable). I do not see this as different enough to justify a new materialization either.

markproctor1 added type:enhancement New feature request triage:product In Product's queue labels Apr 4, 2023

github-actions bot changed the title ~~[Feature] Snowflake Adapter supports INSERT OVERWRITE~~ [ADAP-432] [Feature] Snowflake Adapter supports INSERT OVERWRITE Apr 4, 2023

Fleid self-assigned this Apr 5, 2023

Fleid added type:enhancement New feature request and removed type:enhancement New feature request triage:product In Product's queue labels Apr 5, 2023

github-actions bot added the triage:product In Product's queue label Apr 6, 2023

Fleid removed the triage:product In Product's queue label Apr 6, 2023

github-actions bot added the triage:product In Product's queue label Apr 6, 2023

Fleid removed the triage:product In Product's queue label Apr 6, 2023

github-actions bot added the triage:product In Product's queue label Apr 6, 2023

Fleid removed the triage:product In Product's queue label Apr 6, 2023

github-actions bot added the triage:product In Product's queue label Apr 6, 2023

Fleid removed the triage:product In Product's queue label Apr 7, 2023

github-actions bot added the triage:product In Product's queue label Apr 7, 2023

Fleid removed the triage:product In Product's queue label Apr 7, 2023

github-actions bot added the triage:product In Product's queue label Apr 8, 2023

Fleid added help-wanted Extra attention is needed and removed triage:product In Product's queue labels Apr 17, 2023

dataders added the triage:product In Product's queue label Jun 12, 2023

dataders added the feature:incremental Issues related to incremental materializations label Jun 20, 2023

dataders removed feature:incremental Issues related to incremental materializations triage:product In Product's queue labels Aug 17, 2023

Fleid removed their assignment Feb 14, 2024

mikealfare added the pkg:dbt-snowflake Issue affects dbt-snowflake label Jan 15, 2025

mikealfare transferred this issue from dbt-labs/dbt-snowflake Jan 25, 2025

amychen1776 assigned VersusFacit Jan 28, 2025

VersusFacit mentioned this issue Feb 28, 2025

Mp/add insert overwrite incremental strategy #855

Merged

4 tasks

VersusFacit closed this as completed in #855 Mar 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ADAP-432] [Feature] Snowflake Adapter supports INSERT OVERWRITE #736

[ADAP-432] [Feature] Snowflake Adapter supports INSERT OVERWRITE #736

markproctor1 commented Apr 4, 2023 •

edited

Loading

Fleid commented Apr 5, 2023

adammarples commented Apr 6, 2023

Fleid commented Apr 6, 2023

markproctor1 commented Apr 6, 2023

Fleid commented Apr 6, 2023

markproctor1 commented Apr 6, 2023

Fleid commented Apr 6, 2023

markproctor1 commented Apr 6, 2023

Fleid commented Apr 7, 2023

markproctor1 commented Apr 7, 2023

Fleid commented Apr 7, 2023

adammarples commented Apr 8, 2023

markproctor1 commented Apr 10, 2023

jtcohen6 commented Apr 12, 2023

markproctor1 commented Apr 13, 2023

Fleid commented Apr 17, 2023

markproctor1 commented Aug 17, 2023

dataders commented Aug 17, 2023

markproctor1 commented Aug 17, 2023

dataders commented Aug 17, 2023

umarh-ccy commented Jun 27, 2024

james-mcp-cdi commented Sep 5, 2024

markproctor1 commented Sep 5, 2024

ian-andriot-cdi commented Sep 11, 2024 •

edited

Loading

sanjayan1995 commented Nov 18, 2024

JHYL0406 commented Dec 4, 2024

thomas-lochner commented Jan 23, 2025 •

edited

Loading

thomas-lochner commented Jan 24, 2025 •

edited

Loading

amychen1776 commented Jan 28, 2025 •

edited

Loading

amychen1776 commented Feb 6, 2025 •

edited

Loading

[ADAP-432] [Feature] Snowflake Adapter supports INSERT OVERWRITE #736

[ADAP-432] [Feature] Snowflake Adapter supports INSERT OVERWRITE #736

Comments

markproctor1 commented Apr 4, 2023 • edited Loading

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

Fleid commented Apr 5, 2023

adammarples commented Apr 6, 2023

Fleid commented Apr 6, 2023

markproctor1 commented Apr 6, 2023

Fleid commented Apr 6, 2023

markproctor1 commented Apr 6, 2023

Fleid commented Apr 6, 2023

markproctor1 commented Apr 6, 2023

Fleid commented Apr 7, 2023

markproctor1 commented Apr 7, 2023

Fleid commented Apr 7, 2023

adammarples commented Apr 8, 2023

markproctor1 commented Apr 10, 2023

jtcohen6 commented Apr 12, 2023

markproctor1 commented Apr 13, 2023

Fleid commented Apr 17, 2023

markproctor1 commented Aug 17, 2023

dataders commented Aug 17, 2023

markproctor1 commented Aug 17, 2023

dataders commented Aug 17, 2023

umarh-ccy commented Jun 27, 2024

james-mcp-cdi commented Sep 5, 2024

markproctor1 commented Sep 5, 2024

ian-andriot-cdi commented Sep 11, 2024 • edited Loading

sanjayan1995 commented Nov 18, 2024

JHYL0406 commented Dec 4, 2024

thomas-lochner commented Jan 23, 2025 • edited Loading

thomas-lochner commented Jan 24, 2025 • edited Loading

amychen1776 commented Jan 28, 2025 • edited Loading

amychen1776 commented Feb 6, 2025 • edited Loading

markproctor1 commented Apr 4, 2023 •

edited

Loading

ian-andriot-cdi commented Sep 11, 2024 •

edited

Loading

thomas-lochner commented Jan 23, 2025 •

edited

Loading

thomas-lochner commented Jan 24, 2025 •

edited

Loading

amychen1776 commented Jan 28, 2025 •

edited

Loading

amychen1776 commented Feb 6, 2025 •

edited

Loading