Skip to content

Conversation

erindru
Copy link
Collaborator

@erindru erindru commented Sep 3, 2025

This PR builds on #5273 and #5274

Currently, the RestatementStage clears intervals from state before the BackfillStage populates data.

This means that other processes that look at state while the restatement is running will see missing intervals and try to fill them, competing with the restatement process.

This PR adjusts the plan evaluation order to:

  • move RestatementStage to after BackfillStage
    • note that despite the name, RestatementStage doesn't actually perform restatement, it's just responsible for clearing intervals from state
  • adjust RestatementStage to clear intervals from all environments except prod

The result of this means that:

  • Intervals are never cleared from prod so no other processes see missing intervals and compete with the current plan to try and fill them
  • Intervals are only cleared from dev environments if they were successfully restated in prod

There are some new failure modes:

  • If the plan fails during restatement, prod will be in an inconsistent state. It's expected that the user will run the plan again until it succeeds. Once it succeeds, prod will be back in a consistent state.
  • If a user deploys a new version of the model while the restatement is occurring (a side-effect of keeping restatements internal is that other plans don't know about them), there is a check for new snapshot versions of the restated models. If some are detected, they are output to the console before the plan fails with the usual conflict error:
Screenshot From 2025-09-05 12-05-10

):
self.state_reader = state_reader
self.default_catalog = default_catalog
self.explain = explain
Copy link
Member

@izeigerman izeigerman Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh no, let's not do this. Stages should be completely independent from how they are interpreted downstream. That's the whole point. We don't want to have diverging paths between the explainer and the actual evaluation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I just wanted to attach some extra metadata for --explain that isnt needed by the evaluator because the evaluator re-calculates it at runtime in order to capture any drift that may have occurred while the restatements were happening,

If the metadata is not conditionally attached then we incur the overhead of computing it on every plan, even if it's not needed, right?

The goal is to enable CLI output like the following (in an upcoming PR) when someone plans with --explain:
Screenshot From 2025-09-04 08-56-44

because right now, the restatements CLI output is a bit confusing and misleading:
Screenshot From 2025-09-04 09-05-34

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we should have a common logic between explainer and the evaluator to compute this information, and then call it in the explainer itself. But it shouldn't happen in the stage builder.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, i've updated the explainer to take a RestatementStage and turn it into an ExplainableRestatementStage which contains the fields needed for the extended console output

all_snapshots: t.Dict[str, Snapshot]

# Only used for --explain so may not be populated
snapshot_intervals_to_clear: t.Optional[t.Dict[str, SnapshotIntervalClearRequest]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is only used for explainer, it might as well use plan.restatements / plan.deployability_index directly

Copy link
Collaborator Author

@erindru erindru Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It contains snapshots that aren't part the plan though (since they exist in other environments and are just getting intervals cleared / not evaluated), so plan.restatements isn't the full picture and thus neither is plan.deployability_index because these both only include snapshots that are being specifically evaluated in the plan.

I was attaching this in the stage builder and not in the explainer because the explainer console doesnt have a reference to the state sync or the current plan, perhaps I should pass these to the explainer console so it can look up extra data it needs to explain something?

@erindru erindru force-pushed the erin/adjust-restatement-stage branch from 4ff0abb to ac4e372 Compare September 5, 2025 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants