-
Notifications
You must be signed in to change notification settings - Fork 254
Feat: prevent other processes seeing missing intervals during restatement #5285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: erin/fix-restatement-clear-across-all-environments
Are you sure you want to change the base?
Feat: prevent other processes seeing missing intervals during restatement #5285
Conversation
sqlmesh/core/plan/stages.py
Outdated
): | ||
self.state_reader = state_reader | ||
self.default_catalog = default_catalog | ||
self.explain = explain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh no, let's not do this. Stages should be completely independent from how they are interpreted downstream. That's the whole point. We don't want to have diverging paths between the explainer and the actual evaluation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I just wanted to attach some extra metadata for --explain
that isnt needed by the evaluator because the evaluator re-calculates it at runtime in order to capture any drift that may have occurred while the restatements were happening,
If the metadata is not conditionally attached then we incur the overhead of computing it on every plan, even if it's not needed, right?
The goal is to enable CLI output like the following (in an upcoming PR) when someone plans with --explain
:
because right now, the restatements CLI output is a bit confusing and misleading:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we should have a common logic between explainer and the evaluator to compute this information, and then call it in the explainer itself. But it shouldn't happen in the stage builder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, i've updated the explainer to take a RestatementStage
and turn it into an ExplainableRestatementStage
which contains the fields needed for the extended console output
sqlmesh/core/plan/stages.py
Outdated
all_snapshots: t.Dict[str, Snapshot] | ||
|
||
# Only used for --explain so may not be populated | ||
snapshot_intervals_to_clear: t.Optional[t.Dict[str, SnapshotIntervalClearRequest]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is only used for explainer, it might as well use plan.restatements
/ plan.deployability_index
directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It contains snapshots that aren't part the plan though (since they exist in other environments and are just getting intervals cleared / not evaluated), so plan.restatements
isn't the full picture and thus neither is plan.deployability_index
because these both only include snapshots that are being specifically evaluated in the plan.
I was attaching this in the stage builder and not in the explainer because the explainer console doesnt have a reference to the state sync or the current plan, perhaps I should pass these to the explainer console so it can look up extra data it needs to explain something?
4ff0abb
to
ac4e372
Compare
This PR builds on #5273 and #5274
Currently, the
RestatementStage
clears intervals from state before theBackfillStage
populates data.This means that other processes that look at state while the restatement is running will see missing intervals and try to fill them, competing with the restatement process.
This PR adjusts the plan evaluation order to:
RestatementStage
to afterBackfillStage
RestatementStage
doesn't actually perform restatement, it's just responsible for clearing intervals from stateRestatementStage
to clear intervals from all environments except prodThe result of this means that:
prod
so no other processes see missing intervals and compete with the current plan to try and fill themThere are some new failure modes:
prod
will be in an inconsistent state. It's expected that the user will run the plan again until it succeeds. Once it succeeds,prod
will be back in a consistent state.