-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADAP-544: Add configuration change options #659
Conversation
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the dbt-snowflake contributing guide. |
dbt/adapters/snowflake/relation.py
Outdated
return SnowflakeRelationType | ||
|
||
@classmethod | ||
def from_runtime_config(cls, runtime_config: RuntimeConfigObject) -> RelationConfigBase: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this function mean? I tried figuring that out from reading where it is invoked. That said, it's not immediately clear. You want the model from the runtime config which the materialization invokes? Just a lot of indirection here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely need a docstring here; I'll add it. The idea is to produce a RelationConfig
from the RuntimeConfigObject
; in this case it would produce a SnowflakeDynamicTableConfig
instance. RuntimeConfigObject
contains all of the information needed to create a SnowflakeDynamicTableConfig
, plus a lot more. There are two key differences. The latter is in the parlance of the Snowflake object, and not the generic RuntimeConfigObject
. And producing the latter provides a much better place for validation checks and structural checks; we don't need to do it at the SnowflakeRelation
level or in the jinja.
dbt/adapters/snowflake/relation.py
Outdated
model_node: ModelNode = runtime_config.model | ||
relation_type = SnowflakeRelationType(model_node.config.materialized) | ||
|
||
if relation_config := cls.relation_configs.get(relation_type): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh the walrus operator is now usable by us? Oh good.
What's your intended semantics? Non-Optional
assignment semantics frighten me. What does get
do if relation_type
is None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case we control the value, and it can't be None
because relation_type is a Enum
. But I do explicitly use that in all of the config parsers. Passing None
is effectively not passing it at all. I think I covered the scenarios there, where None
is not actually a valid value for that config parameter. But agreed, we need to watch out for that and check for the existence of the key in those scenarios. I could also make the argument that we don't allow that value in our interface.
|
||
@dataclass | ||
class SnowflakeDynamicTableConfigChangeset: | ||
target_lag: Optional[SnowflakeDynamicTableTargetLagConfigChange] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't remember. Could you just have the vars without the = None
and get the same thing? Are you just erring to explicit? Is one considered cleaner? Can I find a way to ask more questions about this one line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember correctly, dataclass
yells at you if you have an Optional
and you don't provide the variable because you haven't defined a default, even if that default is None
. I think it doesn't assume. That being said, these are one of two varieties. There can either be a change that can only happen once (e.g. target_lag
), or a set of changes all of the same type (e.g. indexes
on Postgres). So I'm also being explicit that the null case here is None
and not set()
. I've also thought about collapsing this entirely and just having one single set of changes, but that leaves the parsing of "what type of change is this?" to the caller, which in this case is jinja.
dbt/adapters/snowflake/relation.py
Outdated
existing_dynamic_table = SnowflakeDynamicTableConfig.from_relation_results( | ||
relation_results | ||
) | ||
assert isinstance(existing_dynamic_table, SnowflakeDynamicTableConfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover assert
? Should it be an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to make the linter happy. from_relation_results()
returns a generic RelationConfigBase()
because it's on the base class. The base class has no platform-specific attributes like target_lag
, warehouse
, etc. Instead of doing a # type: ignore
and dealing with the warnings in the IDE, I solve both with this assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what happens when this crashes on a user for some reason down the line against all our assumptions? We fine with this being the possible bug report?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I should probably wrap that in a try
/ catch
and make it pretty, fair point.
dbt/adapters/snowflake/relation.py
Outdated
@classmethod | ||
def dynamic_table_config_changeset( | ||
cls, | ||
dynamic_table: SnowflakeDynamicTableConfig, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is pointing to what we're creating? Should this be like target_
prefixed or something like that to highlight this is the to-be-built copy of an existing table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm definitely focusing on the new snapshot, which lives in the user config, as the "primary" view of the dynamic table. I did that because that's where I am when I'm running this, in the future looking back at the past. However, as noted above, this should probably change. I also don't like relation_results
since it's very vague. This was the result of several back and forth's in reviews. I think I just might bite the bullet, and do the parsing for the existing_dynamic_table
outside of this method as well, and then wind up with existing_dynamic_table
and new_dynamic_table
as the arguments, which is more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love a new_dynamic_table
name everywhere personally!
dbt/adapters/snowflake/relation.py
Outdated
|
||
if config_changeset.has_changes: | ||
return config_changeset | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: complete this with an else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
::gets on soap box and raises pinky:: My understanding is that it's more idiomatic to return the None
outside of the else
block since the else
block isn't doing anything. That's how I've generally seen it, so the else: return None
construct looks strange to me, as if we could get past it somehow and not return anything (because it's indented). A compromise could be:
return config_changeset if config_changeset.has_changes else None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bruh I trust you but wow. Conventions be something. I'm happy with the compromise hehe
dbt/adapters/snowflake/relation.py
Outdated
) | ||
|
||
@classmethod | ||
def dynamic_table_config_changeset( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this function is building the changeset between the new and existing version of the table? I find the name doesn't capture these semantics, and it would help self-document
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went around the block on naming this and I agree that it's not super intuitive. I'll see if I can come up with something.
For context, here was my thought process leading up to this name. At this point in the process, we know that there's an existing database object and an object in the config that points to that database object, which may or may not have changes. Both things are really the same thing, the same dynamic table. We're merely taking another snapshot and then determining if the state of the object (minus data) in the config matches the already existing object in the database. Since they both represent the same dynamic table at two points in time, we're effectively determining the changeset between those two points in time on the same dynamic table; hence dynamic_table_config_changeset()
. This makes me wonder if maybe the method name isn't that bad, but the arguments should be updated, as you point out below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with more specific args and doc strings!
@staticmethod | ||
def _render_part(component: ComponentName, value: str) -> Optional[str]: | ||
if SnowflakeIncludePolicy().get_part(component): | ||
if SnowflakeQuotePolicy().get_part(component): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the QuotePolicy intended to be subordinate to the IncludePolicy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're not including that part, then we don't care about the quote policy for that part. But if we are including that part, then we need to see if we're quoting it.
query: str | ||
target_lag: SnowflakeDynamicTableTargetLagConfig | ||
warehouse: str | ||
|
||
@property | ||
def path(self) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Howabout relation path or db_relation_path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this will always be used in the form dynamic_table.path
, I'm not keen on dynamic_table.relation_path
since it feels like doublespeak. I chose path
because I believe we call it that in dbt-core
somewhere. However, path
is generic; what this really is is the "fully qualified path". And since we're in relation_config
land, we should be talking in database parlance. I'll update it to fully_qualified_path
, but let me know your thoughts on that name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love fully_qualified_path
.
@property | ||
def path(self) -> str: | ||
return ".".join( | ||
part for part in [self.database_name, self.schema_name, self.name] if part is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what happens if self.schema_name
is None
but the other's aren't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, I thought the same thing, but this is how we roll in dbt-core
. fwiw, those three attributes are required on this class, but could be provided as ""
. I'm fine dealing with that edge case since the user would effectively be choosing to break it. That becomes more of an issue when this winds up in a base class in dbt-core
(down the road). If you're still concerned, I would address it with a validation rule, checking that all three parts are of non-zero length (rather than complicate this logic).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love this but I'll take the explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put validation checks here.
@@ -21,6 +35,147 @@ class SnowflakeDynamicTableConfig(RelationConfigBase): | |||
""" | |||
|
|||
name: str | |||
schema_name: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am confused. Down on line 67 you are using if not none assignment style guards' but these are str
(as opposed to options?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, I did that in cases when there are defaults because passing in my_arg=None
does not result in the default for my_arg
, it results in None
. However, I don't have defaults. I'll remove those guards below.
create or replace dynamic table {{ dynamic_table.path }} | ||
target_lag = '{{ dynamic_table.target_lag }}' | ||
warehouse = {{ dynamic_table.warehouse }} | ||
as ({{ dynamic_table.query }}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay so I think this is one of those places that without spacing to
as (
{{ dynamic_table.query }}
)
would result in bugs (e.g. user puts comments at bottom of file).
We've seen this in RS (or whichever db 🙈 ). Did you prescrub out comments? If not, we should probably code defensively here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I forgot about that. Nice catch. To be honest, I made it look like that because there were extra returns in the logs and this looked prettier, lol. Will change.
validation_check=( | ||
( | ||
self.duration >= 1 | ||
and self.period != SnowflakeDynamicTableTargetLagPeriod.seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seconds
-> minute
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And as insane as it sounds, could someone say 0 hours by chance? 🙊 Might want to guard against.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is already guarding against the 0 hours
scenario because it's requiring a positive number. But it's a bit hidden since I do a check against the type and then combine it with another check. I'll break this into two checks: positive integer, and >= 60 if your unit is seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh excellent point about the boolean logic though!
return target_lag | ||
|
||
@classmethod | ||
def parse_model_node(cls, model_node: ModelNode) -> dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this paradigm you've made but I want to! Who is calling this? Why is the name so generic when the result is so specific to the class? (I see you do this elsewhere)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a poor man's mashumaro
, pydantic
, marshmallow
, etc. I could see replacing this in the long term with a framework like that. The idea is that there are several ways to create this "generic" dynamic view representation. This dataclass is just one giant set of parsers with validation rules. The two parsers I found so far is database > config and user-config > config. I wouldn't be surprised if we wind up with more. Maybe someone wants to get crazy and do a Redshift MV user-config > Snowflake DT config to do a migration. Whatever it is, this config class is our anchor. And we generally want two representations of this config. We want a nice shiny, python object with dot path-ing, etc. so that the jinja reads cleanly, other dependent code reads cleanly, etc. And then we usually want the dictionary version that is ubiquitous; it's nice for logs, it translates out of python nicely if necessary (basically json). In order to do that in an hierarchical object, each object needs both and we need to split apart the foreign format > dict and dict > object computation. A framework like mashumaro
does all of that behind the scenes so you don't need to worry about it. However, we still have the added complexity of needing more than one parser since our objects come from more than one source. I'd be happy to discuss this further, I think it need sync discussions to make sense.
target_lag: str = model_node.config.extra.get("target_lag") | ||
try: | ||
duration, period = target_lag.split(" ") | ||
except IndexError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also catch the AttributeError
when target_lag
is None
(since get
can do that) unless that's guarded against elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be in dbt/adapters/snowflake/relation_configs/dynamic_table.py
around line 78
? Still, if you ever instantiate this without the AttributeError
check, you'll get an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that sounds right. I think I originally had this in an if:
if target_lag := model_node.config.extra.get("target_lag"):
try:
duration, period = target_lag.split(" ")
except IndexError:
duration, period = None, None
But then I removed the if and forgot about that case. I'll update.
return config_dict | ||
|
||
@classmethod | ||
def parse_relation_results(cls, relation_results: RelationResults) -> dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this method get used? For output or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It get's used in super().from_relation_results()
, which merely chains together this and from_dict()
. This may not make as much sense here, but I'm going for a consistent look/feel across adapters and RelationConfigBase
classes. When there are attributes which are themselves RelationConfigBase
classes, e.g. target_lag
below, then we need this method as a stopping point to produce a dict
. Hence it gets called if this instance were an attribute on a bigger object.
tl;dr; Consistency? And to fit into the base class framework
"text", | ||
"target_lag", | ||
"warehouse" | ||
from table(result_scan(last_query_id())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concurrency nightmare possible? Maybe.....we could do a dynamic query call? Or are you confident this is thread-safe/otherwise-collision-safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the recommended way to read the results from show dynamic tables
according to the docs I found. I could see if I can find that again if you want. I'm not tied to it though; it worked so I went with it.
@@ -1,46 +1,96 @@ | |||
{% macro snowflake__get_alter_dynamic_table_as_sql( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this file a bunch of helper DDL for the snowflake methods for building out this table? If I'm right, would love to see that documented at filetop. If I'm wrong, another reason to document our intentions 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's the former. It's the pieces that go into dynamic_table/materialization.sql
. When the materialization is in dbt-core
, this boils down to just a file dynamic_table.sql
. I broke it out here to emphasize that the materialization is local (and because it's a lot of code with slightly different uses). Agreed, I should (and will) add a doc block at the top.
|
||
""" | ||
open_paren = query.find("as (") + len("as (") | ||
close_paren = query.rindex(")") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Damn, that's stressful to me -- so many edge cases -- but it does seem like you've accounted to the 5 my brain came up with!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I basically looked at what Snowflake returned on the most basic dynamic table. That being said, we control the format of what goes in there since we're writing the DDL that wraps this statement (the user's SQL). We're effectively unwrapping the statement here. I'm sure we'll find an edge case eventually and this will grow. The upside? We can actually compare the query of the object in the database and the query the user provided to see if there's a difference before deciding to deploy (this could be used to simplify view processing in the future).
{% elif full_refresh_mode or not existing_relation.is_dynamic_table %} | ||
{% set build_sql = snowflake__get_replace_dynamic_table_as_sql(target_relation, sql, existing_relation, backup_relation, intermediate_relation) %} | ||
{% set build_sql = snowflake__replace_dynamic_table_sql(dynamic_table, existing_relation, backup_relation, intermediate_relation) %} | ||
{% else %} | ||
|
||
-- get config options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dangling/incomplete comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very similar to the new built-in materialized view materialization in dbt-core
(if you find/replace "dynamic table" with "materialized view", you might even say they're the same). That being said, I had these comments at the time to try and lead myself (and the next unfortunate soul) through this giant, nested, jinja-dwelling if block. But it goes back to your comment of "comments should say why, not what". And I think what this is suggesting is that maybe it should be its own macro, which would self-document what I'm doing here. I'll try some combination of all of that to make this better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'd be rad!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WOW! That's a lot of code! Still, all the boilerplate you wrote made it pretty starightforward to read so kudos.
Hey @mikealfare looking at the original spec, the DT should not be refreshed if there are no changes and their status isn't
I'm not sure what that means? Is that still an |
… into feature/dynamic-tables/adap-544
# Conflicts: # dbt/include/snowflake/macros/materializations/dynamic_table/ddl.sql # tests/functional/adapter/dynamic_table_tests/fixtures.py # tests/functional/adapter/dynamic_table_tests/test_dynamic_table.py
This PR is way out of date. The branch will be saved for reference, but the PR is stale. |
resolves #603
Description
Implement dynamic tables as a materialization option for Snowflake.
Workflow:
target_lag
: alter dynamic table in placewarehouse
: full refreshConfiguration:
on_configuration_change
: determines the action to take when changes are detected:apply
: full refresh on warehouse changes, alter on target_lag changescontinue
: soft fail, skip the processing of this model, but process downstreamfail
: fail entire runChecklist
changie new
to create a changelog entry