ADAP-544: Add configuration change options #659

mikealfare · 2023-06-14T05:05:06Z

resolves #603

Description

Implement dynamic tables as a materialization option for Snowflake.

Workflow:

Refresh existing dynamic table with no changes
Accept and process changes to a dynamic table:
- target_lag: alter dynamic table in place
- warehouse: full refresh

Configuration:

on_configuration_change: determines the action to take when changes are detected:
- apply: full refresh on warehouse changes, alter on target_lag changes
- continue: soft fail, skip the processing of this model, but process downstream
- fail: fail entire run

Checklist

I have read the contributing guide and understand what's expected of me
I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have opened an issue to add/update docs, or docs changes are not required/relevant for this PR
I have run changie new to create a changelog entry

github-actions · 2023-06-14T05:05:24Z

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the dbt-snowflake contributing guide.

dbt/adapters/snowflake/relation.py

VersusFacit · 2023-06-22T01:04:22Z

dbt/adapters/snowflake/relation.py

+        return SnowflakeRelationType
+
+    @classmethod
+    def from_runtime_config(cls, runtime_config: RuntimeConfigObject) -> RelationConfigBase:


What does this function mean? I tried figuring that out from reading where it is invoked. That said, it's not immediately clear. You want the model from the runtime config which the materialization invokes? Just a lot of indirection here.

I definitely need a docstring here; I'll add it. The idea is to produce a RelationConfig from the RuntimeConfigObject; in this case it would produce a SnowflakeDynamicTableConfig instance. RuntimeConfigObject contains all of the information needed to create a SnowflakeDynamicTableConfig, plus a lot more. There are two key differences. The latter is in the parlance of the Snowflake object, and not the generic RuntimeConfigObject. And producing the latter provides a much better place for validation checks and structural checks; we don't need to do it at the SnowflakeRelation level or in the jinja.

VersusFacit · 2023-06-22T01:04:59Z

dbt/adapters/snowflake/relation.py

+        model_node: ModelNode = runtime_config.model
+        relation_type = SnowflakeRelationType(model_node.config.materialized)
+
+        if relation_config := cls.relation_configs.get(relation_type):


Oh the walrus operator is now usable by us? Oh good.

What's your intended semantics? Non-Optional assignment semantics frighten me. What does get do if relation_type is None?

In this case we control the value, and it can't be None because relation_type is a Enum. But I do explicitly use that in all of the config parsers. Passing None is effectively not passing it at all. I think I covered the scenarios there, where None is not actually a valid value for that config parameter. But agreed, we need to watch out for that and check for the existence of the key in those scenarios. I could also make the argument that we don't allow that value in our interface.

VersusFacit · 2023-06-22T01:07:39Z

dbt/adapters/snowflake/relation_configs/dynamic_table.py

+
+@dataclass
+class SnowflakeDynamicTableConfigChangeset:
+    target_lag: Optional[SnowflakeDynamicTableTargetLagConfigChange] = None


Can't remember. Could you just have the vars without the = None and get the same thing? Are you just erring to explicit? Is one considered cleaner? Can I find a way to ask more questions about this one line?

If I remember correctly, dataclass yells at you if you have an Optional and you don't provide the variable because you haven't defined a default, even if that default is None. I think it doesn't assume. That being said, these are one of two varieties. There can either be a change that can only happen once (e.g. target_lag), or a set of changes all of the same type (e.g. indexes on Postgres). So I'm also being explicit that the null case here is None and not set(). I've also thought about collapsing this entirely and just having one single set of changes, but that leaves the parsing of "what type of change is this?" to the caller, which in this case is jinja.

VersusFacit · 2023-06-22T01:08:24Z

dbt/adapters/snowflake/relation.py

+        existing_dynamic_table = SnowflakeDynamicTableConfig.from_relation_results(
+            relation_results
+        )
+        assert isinstance(existing_dynamic_table, SnowflakeDynamicTableConfig)


Leftover assert? Should it be an exception?

This is to make the linter happy. from_relation_results() returns a generic RelationConfigBase() because it's on the base class. The base class has no platform-specific attributes like target_lag, warehouse, etc. Instead of doing a # type: ignore and dealing with the warnings in the IDE, I solve both with this assert.

So what happens when this crashes on a user for some reason down the line against all our assumptions? We fine with this being the possible bug report?

Yeah, I should probably wrap that in a try / catch and make it pretty, fair point.

VersusFacit · 2023-06-22T01:09:51Z

dbt/adapters/snowflake/relation.py

+    @classmethod
+    def dynamic_table_config_changeset(
+        cls,
+        dynamic_table: SnowflakeDynamicTableConfig,


So this is pointing to what we're creating? Should this be like target_ prefixed or something like that to highlight this is the to-be-built copy of an existing table?

I'm definitely focusing on the new snapshot, which lives in the user config, as the "primary" view of the dynamic table. I did that because that's where I am when I'm running this, in the future looking back at the past. However, as noted above, this should probably change. I also don't like relation_results since it's very vague. This was the result of several back and forth's in reviews. I think I just might bite the bullet, and do the parsing for the existing_dynamic_table outside of this method as well, and then wind up with existing_dynamic_table and new_dynamic_table as the arguments, which is more clear.

I'd love a new_dynamic_table name everywhere personally!

VersusFacit · 2023-06-22T01:11:26Z

dbt/adapters/snowflake/relation.py

+
+        if config_changeset.has_changes:
+            return config_changeset
+        return None


nit: complete this with an else

::gets on soap box and raises pinky:: My understanding is that it's more idiomatic to return the None outside of the else block since the else block isn't doing anything. That's how I've generally seen it, so the else: return None construct looks strange to me, as if we could get past it somehow and not return anything (because it's indented). A compromise could be:

return config_changeset if config_changeset.has_changes else None

Bruh I trust you but wow. Conventions be something. I'm happy with the compromise hehe

VersusFacit · 2023-06-22T01:12:54Z

dbt/adapters/snowflake/relation.py

+        )
+
+    @classmethod
+    def dynamic_table_config_changeset(


So this function is building the changeset between the new and existing version of the table? I find the name doesn't capture these semantics, and it would help self-document

I went around the block on naming this and I agree that it's not super intuitive. I'll see if I can come up with something.

For context, here was my thought process leading up to this name. At this point in the process, we know that there's an existing database object and an object in the config that points to that database object, which may or may not have changes. Both things are really the same thing, the same dynamic table. We're merely taking another snapshot and then determining if the state of the object (minus data) in the config matches the already existing object in the database. Since they both represent the same dynamic table at two points in time, we're effectively determining the changeset between those two points in time on the same dynamic table; hence dynamic_table_config_changeset(). This makes me wonder if maybe the method name isn't that bad, but the arguments should be updated, as you point out below.

I'm fine with more specific args and doc strings!

.changes/unreleased/Features-20230620-224944.yaml

VersusFacit · 2023-06-22T01:15:35Z

dbt/adapters/snowflake/relation_configs/base.py

+    @staticmethod
+    def _render_part(component: ComponentName, value: str) -> Optional[str]:
+        if SnowflakeIncludePolicy().get_part(component):
+            if SnowflakeQuotePolicy().get_part(component):


Is the QuotePolicy intended to be subordinate to the IncludePolicy?

If we're not including that part, then we don't care about the quote policy for that part. But if we are including that part, then we need to see if we're quoting it.

VersusFacit · 2023-06-22T01:18:22Z

dbt/adapters/snowflake/relation_configs/dynamic_table.py

    query: str
    target_lag: SnowflakeDynamicTableTargetLagConfig
    warehouse: str
+
+    @property
+    def path(self) -> str:


Howabout relation path or db_relation_path?

Since this will always be used in the form dynamic_table.path, I'm not keen on dynamic_table.relation_path since it feels like doublespeak. I chose path because I believe we call it that in dbt-core somewhere. However, path is generic; what this really is is the "fully qualified path". And since we're in relation_config land, we should be talking in database parlance. I'll update it to fully_qualified_path, but let me know your thoughts on that name.

I love fully_qualified_path.

VersusFacit · 2023-06-22T01:20:07Z

dbt/adapters/snowflake/relation_configs/dynamic_table.py

+    @property
+    def path(self) -> str:
+        return ".".join(
+            part for part in [self.database_name, self.schema_name, self.name] if part is not None


So what happens if self.schema_name is None but the other's aren't?

lol, I thought the same thing, but this is how we roll in dbt-core. fwiw, those three attributes are required on this class, but could be provided as "". I'm fine dealing with that edge case since the user would effectively be choosing to break it. That becomes more of an issue when this winds up in a base class in dbt-core (down the road). If you're still concerned, I would address it with a validation rule, checking that all three parts are of non-zero length (rather than complicate this logic).

I don't love this but I'll take the explanation.

I put validation checks here.

dbt/adapters/snowflake/relation_configs/dynamic_table.py

VersusFacit · 2023-06-22T01:22:19Z

dbt/adapters/snowflake/relation_configs/dynamic_table.py

@@ -21,6 +35,147 @@ class SnowflakeDynamicTableConfig(RelationConfigBase):
    """

    name: str
+    schema_name: str


Am confused. Down on line 67 you are using if not none assignment style guards' but these are str (as opposed to options?)

True, I did that in cases when there are defaults because passing in my_arg=None does not result in the default for my_arg, it results in None. However, I don't have defaults. I'll remove those guards below.

VersusFacit · 2023-06-22T01:32:19Z

dbt/include/snowflake/macros/materializations/dynamic_table/ddl.sql

+    create or replace dynamic table {{ dynamic_table.path }}
+        target_lag = '{{ dynamic_table.target_lag }}'
+        warehouse = {{ dynamic_table.warehouse }}
+        as ({{ dynamic_table.query }})


Okay so I think this is one of those places that without spacing to

as ( {{ dynamic_table.query }} )

would result in bugs (e.g. user puts comments at bottom of file).

We've seen this in RS (or whichever db 🙈 ). Did you prescrub out comments? If not, we should probably code defensively here.

Ah yes, I forgot about that. Nice catch. To be honest, I made it look like that because there were extra returns in the logs and this looked prettier, lol. Will change.

VersusFacit · 2023-06-22T01:34:12Z

dbt/adapters/snowflake/relation_configs/target_lag.py

+                validation_check=(
+                    (
+                        self.duration >= 1
+                        and self.period != SnowflakeDynamicTableTargetLagPeriod.seconds


seconds -> minute?

And as insane as it sounds, could someone say 0 hours by chance? 🙊 Might want to guard against.

I think this is already guarding against the 0 hours scenario because it's requiring a positive number. But it's a bit hidden since I do a check against the type and then combine it with another check. I'll break this into two checks: positive integer, and >= 60 if your unit is seconds.

Oh excellent point about the boolean logic though!

VersusFacit · 2023-06-22T01:36:52Z

dbt/adapters/snowflake/relation_configs/target_lag.py

+        return target_lag
+
+    @classmethod
+    def parse_model_node(cls, model_node: ModelNode) -> dict:


I don't understand this paradigm you've made but I want to! Who is calling this? Why is the name so generic when the result is so specific to the class? (I see you do this elsewhere)

It's a poor man's mashumaro, pydantic, marshmallow, etc. I could see replacing this in the long term with a framework like that. The idea is that there are several ways to create this "generic" dynamic view representation. This dataclass is just one giant set of parsers with validation rules. The two parsers I found so far is database > config and user-config > config. I wouldn't be surprised if we wind up with more. Maybe someone wants to get crazy and do a Redshift MV user-config > Snowflake DT config to do a migration. Whatever it is, this config class is our anchor. And we generally want two representations of this config. We want a nice shiny, python object with dot path-ing, etc. so that the jinja reads cleanly, other dependent code reads cleanly, etc. And then we usually want the dictionary version that is ubiquitous; it's nice for logs, it translates out of python nicely if necessary (basically json). In order to do that in an hierarchical object, each object needs both and we need to split apart the foreign format > dict and dict > object computation. A framework like mashumaro does all of that behind the scenes so you don't need to worry about it. However, we still have the added complexity of needing more than one parser since our objects come from more than one source. I'd be happy to discuss this further, I think it need sync discussions to make sense.

VersusFacit · 2023-06-22T01:38:23Z

dbt/adapters/snowflake/relation_configs/target_lag.py

+        target_lag: str = model_node.config.extra.get("target_lag")
+        try:
+            duration, period = target_lag.split(" ")
+        except IndexError:


Should this also catch the AttributeError when target_lag is None (since get can do that) unless that's guarded against elsewhere?

It might be in dbt/adapters/snowflake/relation_configs/dynamic_table.py around line 78? Still, if you ever instantiate this without the AttributeError check, you'll get an exception.

Yeah, that sounds right. I think I originally had this in an if:

if target_lag := model_node.config.extra.get("target_lag"): try: duration, period = target_lag.split(" ") except IndexError: duration, period = None, None

But then I removed the if and forgot about that case. I'll update.

VersusFacit · 2023-06-22T01:42:17Z

dbt/adapters/snowflake/relation_configs/dynamic_table.py

+        return config_dict
+
+    @classmethod
+    def parse_relation_results(cls, relation_results: RelationResults) -> dict:


Where does this method get used? For output or something else?

It get's used in super().from_relation_results(), which merely chains together this and from_dict(). This may not make as much sense here, but I'm going for a consistent look/feel across adapters and RelationConfigBase classes. When there are attributes which are themselves RelationConfigBase classes, e.g. target_lag below, then we need this method as a stopping point to produce a dict. Hence it gets called if this instance were an attribute on a bigger object.

tl;dr; Consistency? And to fit into the base class framework

VersusFacit · 2023-06-22T01:44:39Z

dbt/include/snowflake/macros/materializations/dynamic_table/ddl.sql

+            "text",
+            "target_lag",
+            "warehouse"
+        from table(result_scan(last_query_id()))


concurrency nightmare possible? Maybe.....we could do a dynamic query call? Or are you confident this is thread-safe/otherwise-collision-safe?

This was the recommended way to read the results from show dynamic tables according to the docs I found. I could see if I can find that again if you want. I'm not tied to it though; it worked so I went with it.

VersusFacit · 2023-06-22T01:45:42Z

dbt/include/snowflake/macros/materializations/dynamic_table/ddl.sql

@@ -1,46 +1,96 @@
-{% macro snowflake__get_alter_dynamic_table_as_sql(


Is this file a bunch of helper DDL for the snowflake methods for building out this table? If I'm right, would love to see that documented at filetop. If I'm wrong, another reason to document our intentions 😁

Yes, it's the former. It's the pieces that go into dynamic_table/materialization.sql. When the materialization is in dbt-core, this boils down to just a file dynamic_table.sql. I broke it out here to emphasize that the materialization is local (and because it's a lot of code with slightly different uses). Agreed, I should (and will) add a doc block at the top.

VersusFacit · 2023-06-22T01:47:20Z

dbt/adapters/snowflake/relation_configs/dynamic_table.py

+
+        """
+        open_paren = query.find("as (") + len("as (")
+        close_paren = query.rindex(")")


Damn, that's stressful to me -- so many edge cases -- but it does seem like you've accounted to the 5 my brain came up with!

I basically looked at what Snowflake returned on the most basic dynamic table. That being said, we control the format of what goes in there since we're writing the DDL that wraps this statement (the user's SQL). We're effectively unwrapping the statement here. I'm sure we'll find an edge case eventually and this will grow. The upside? We can actually compare the query of the object in the database and the query the user provided to see if there's a difference before deciding to deploy (this could be used to simplify view processing in the future).

dbt/adapters/snowflake/relation_configs/dynamic_table.py

VersusFacit · 2023-06-22T01:49:53Z

dbt/include/snowflake/macros/materializations/dynamic_table/materialization.sql

    {% elif full_refresh_mode or not existing_relation.is_dynamic_table %}
-        {% set build_sql = snowflake__get_replace_dynamic_table_as_sql(target_relation, sql, existing_relation, backup_relation, intermediate_relation) %}
+        {% set build_sql = snowflake__replace_dynamic_table_sql(dynamic_table, existing_relation, backup_relation, intermediate_relation) %}
    {% else %}

        -- get config options


Dangling/incomplete comment?

This is very similar to the new built-in materialized view materialization in dbt-core (if you find/replace "dynamic table" with "materialized view", you might even say they're the same). That being said, I had these comments at the time to try and lead myself (and the next unfortunate soul) through this giant, nested, jinja-dwelling if block. But it goes back to your comment of "comments should say why, not what". And I think what this is suggesting is that maybe it should be its own macro, which would self-document what I'm doing here. I'll try some combination of all of that to make this better.

That'd be rad!

VersusFacit

WOW! That's a lot of code! Still, all the boilerplate you wrote made it pretty starightforward to read so kudos.

Fleid · 2023-06-28T22:50:46Z

Refresh existing dynamic table with no changes

Hey @mikealfare looking at the original spec, the DT should not be refreshed if there are no changes and their status isn't failed.
I've re-worded the expected lifecycle here, hopefully clarifying it.

full refresh on warehouse changes

I'm not sure what that means? Is that still an ALTER statement? Or are you doing a CREATE OR REPLACE?

… into feature/dynamic-tables/adap-544

# Conflicts: # dbt/include/snowflake/macros/materializations/dynamic_table/ddl.sql # tests/functional/adapter/dynamic_table_tests/fixtures.py # tests/functional/adapter/dynamic_table_tests/test_dynamic_table.py

mikealfare · 2023-09-11T21:34:34Z

This PR is way out of date. The branch will be saved for reference, but the PR is stale.

mikealfare added 2 commits June 13, 2023 23:17

first cut at adding changes, moving to passing dynamic_table by default

17f78f9

first cut at adding changes, moving to passing dynamic_table by default

61e5f81

mikealfare self-assigned this Jun 14, 2023

cla-bot bot added the cla:yes label Jun 14, 2023

mikealfare changed the title ~~Feature/dynamic tables/adap 544~~ ADAP-544: Add configuration change options Jun 14, 2023

mikealfare and others added 5 commits June 14, 2023 01:06

Merge branch 'main' into feature/dynamic-tables/adap-544

502184b

Merge branch 'main' into feature/dynamic-tables/adap-544

890c774

first cut at adding changes, moving to passing dynamic_table by default

25c31a3

Merge branch 'main' into feature/dynamic-tables/adap-544

91c97ec

first cut at adding changes, moving to passing dynamic_table by default

f9e80b6

mikealfare marked this pull request as ready for review June 21, 2023 02:50

mikealfare requested a review from a team as a code owner June 21, 2023 02:50

mikealfare requested review from McKnight-42 and Fleid and removed request for McKnight-42 June 21, 2023 02:50

VersusFacit reviewed Jun 22, 2023

View reviewed changes

dbt/adapters/snowflake/relation.py Outdated Show resolved Hide resolved

VersusFacit reviewed Jun 22, 2023

View reviewed changes

.changes/unreleased/Features-20230620-224944.yaml Show resolved Hide resolved

VersusFacit reviewed Jun 22, 2023

View reviewed changes

dbt/adapters/snowflake/relation_configs/dynamic_table.py Outdated Show resolved Hide resolved

VersusFacit reviewed Jun 22, 2023

View reviewed changes

dbt/adapters/snowflake/relation_configs/dynamic_table.py Outdated Show resolved Hide resolved

VersusFacit reviewed Jun 22, 2023

View reviewed changes

mikealfare marked this pull request as draft June 22, 2023 18:21

mikealfare added 6 commits June 22, 2023 19:34

Merge branch 'main' into feature/dynamic-tables/adap-544

da7cf45

docstring for from_runtime_config

e967d44

add corresponding from_relation_results for from_runtime_config

c12e00f

addressed feedback on relation

7378207

addressed feedback on PR#659

ee123b7

fixed some references between dynamic_table, schema, and database

aa7bfd7

mikealfare marked this pull request as ready for review June 26, 2023 20:51

removed snowflake__ namespace as it is unnecessary

31c4410

mikealfare and others added 5 commits July 5, 2023 15:26

Merge branch 'main' into feature/dynamic-tables/adap-544

f276a72

Merge remote-tracking branch 'origin/feature/dynamic-tables/adap-544'…

691b870

… into feature/dynamic-tables/adap-544

Merge branch 'main' into feature/dynamic-tables/adap-544

1e534f7

# Conflicts: # dbt/include/snowflake/macros/materializations/dynamic_table/ddl.sql # tests/functional/adapter/dynamic_table_tests/fixtures.py # tests/functional/adapter/dynamic_table_tests/test_dynamic_table.py

update to use new framework in dbt-core in alignment with adap-608

532951b

update to use new framework in dbt-core in alignment with adap-608

a430e05

mikealfare marked this pull request as draft July 25, 2023 14:59

mikealfare closed this Sep 11, 2023

mikealfare deleted the feature/dynamic-tables/adap-544 branch July 17, 2024 23:58

		@@ -1,46 +1,96 @@
		{% macro snowflake__get_alter_dynamic_table_as_sql(

ADAP-544: Add configuration change options #659

ADAP-544: Add configuration change options #659

Conversation

mikealfare commented Jun 14, 2023 • edited Loading

Description

Checklist

github-actions bot commented Jun 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VersusFacit Jun 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VersusFacit Jun 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VersusFacit Jun 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VersusFacit Jun 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VersusFacit Jun 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VersusFacit Jun 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VersusFacit Jun 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VersusFacit left a comment

Choose a reason for hiding this comment

Fleid commented Jun 28, 2023 • edited Loading

mikealfare commented Sep 11, 2023

mikealfare commented Jun 14, 2023 •

edited

Loading

VersusFacit Jun 22, 2023 •

edited

Loading

VersusFacit Jun 22, 2023 •

edited

Loading

VersusFacit Jun 22, 2023 •

edited

Loading

VersusFacit Jun 22, 2023 •

edited

Loading

VersusFacit Jun 22, 2023 •

edited

Loading

VersusFacit Jun 22, 2023 •

edited

Loading

VersusFacit Jun 22, 2023 •

edited

Loading

Fleid commented Jun 28, 2023 •

edited

Loading