-
Notifications
You must be signed in to change notification settings - Fork 873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ErrStateMachineNotFound handling in HSM state replication #7032
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -183,10 +183,19 @@ func (r *HSMStateReplicatorImpl) syncHSMNode( | |
incomingNodePath := incomingNode.Path() | ||
currentNode, err := currentHSM.Child(incomingNodePath) | ||
if err != nil { | ||
// 1. Already done history resend if needed before, | ||
// and node creation today always associated with an event | ||
// 2. Node deletion is not supported right now. | ||
// Based on 1 and 2, node should always be found here. | ||
// The node may not be found if: | ||
// 1. The state machine was deleted (e.g. terminal state cleanup) | ||
// 2. We're missing events that created this node | ||
if errors.Is(err, hsm.ErrStateMachineNotFound) { | ||
// In terminal state, nodes can be deleted | ||
// Ignore the error and continue processing other nodes | ||
r.logger.Debug("State machine not found - likely deleted in terminal state", | ||
tag.WorkflowNamespaceID(mutableState.GetExecutionInfo().NamespaceId), | ||
tag.WorkflowID(mutableState.GetExecutionInfo().WorkflowId), | ||
tag.WorkflowRunID(mutableState.GetExecutionInfo().OriginalExecutionRunId), | ||
) | ||
return nil | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: I'd add a sanity check that version history in the mutable state is > the one in the request (same as the one on L265. or just return that info from compareVersionHistory), and return an error otherwise. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't we already check this in compareVersionHistory? In other words, an error will be returned by There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmm not sure I follow. Error is returned from compareVersionHistory if last version history item of the (local) mutable state is < that in the request. The check I mentioned is for > (also not the same as the >= checked in compareVersionHistory) |
||
} | ||
return err | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not true based on the comment that you deleted.
I would also clarify that creation and deletion are always associated with an event.