-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix child WF ID generation #1803
base: master
Are you sure you want to change the base?
Conversation
Questions here:
|
Java seems to be generating UUIDs when an ID is not provided in ChildWorkflow options. Additionally when I replayed an existing
Absolutely yes. Java seems to be already doing this (at least in my testing). Looks like it changes the seed at reset-point?
Yes. So far all server versions have prevented resetting of workflows when the reset point falls in between child Init & Completed events (i.e within the lifespan of child). So no potential user impact or backward compatibility issues there.
Yes it is possible to have these two values different at child creation time. When the reset-point is outside the child's lifespan (Init - Completed) then the server has always allowed resets. So in cases when the reset-point is before the child-init, the parent will try to create a child again and this time |
@gow - thanks for this info! So reading all of that I am seeing two things:
For 1, I will start an internal discussion and invite you. Regardless, I am not sure the desired outcome is what is in this PR. |
The desired outcome of this PR is for Go-SDK to be able to replay |
Putting this on hold. Will discuss internally and work on a better fix. |
c54c992
to
ab1a0c3
Compare
ab1a0c3
to
f0eb17d
Compare
After an internal discussion, we decided to bring Go SDK on par with the rest of the SDKs. So instead of simply using OriginalRunID, we are using the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall LGTM, deferring to @Quinn-With-Two-Ns or @cretz for approval. I vaguely remember there was a test they were hoping was added, but can't remember exactly what that test was.
attr := event.GetWorkflowTaskFailedEventAttributes() | ||
if attr.GetCause() == enumspb.WORKFLOW_TASK_FAILED_CAUSE_RESET_WORKFLOW { | ||
weh.workflowInfo.childWorkflowIDSeed = attr.GetNewRunId() | ||
} | ||
// No Operation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stale comment
@@ -176,7 +176,7 @@ func (s *PollLayerInterfacesTestSuite) TestGetNextCommands() { | |||
createTestEventWorkflowTaskStarted(3), | |||
{ | |||
EventId: 4, | |||
EventType: enumspb.EVENT_TYPE_WORKFLOW_TASK_FAILED, | |||
EventType: enumspb.EVENT_TYPE_WORKFLOW_TASK_TIMED_OUT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why was this changed from failed to timed_out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because we are now not skipping workflow task failed events in history.prepareTask()
(internal/internal_task_handlers.go) so that they can be replayed. Some of the existing tests relied on this event being skipped. So I changed them to use other events that are still being skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vaguely remember there was a test they were hoping was added, but can't remember exactly what that test was.
We need to add some integration tests here. I'm trying to figure out the structuring and setup of them in this codebase. I'll probably sync with you all offline to get some pointers.
@@ -176,7 +176,7 @@ func (s *PollLayerInterfacesTestSuite) TestGetNextCommands() { | |||
createTestEventWorkflowTaskStarted(3), | |||
{ | |||
EventId: 4, | |||
EventType: enumspb.EVENT_TYPE_WORKFLOW_TASK_FAILED, | |||
EventType: enumspb.EVENT_TYPE_WORKFLOW_TASK_TIMED_OUT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because we are now not skipping workflow task failed events in history.prepareTask()
(internal/internal_task_handlers.go) so that they can be replayed. Some of the existing tests relied on this event being skipped. So I changed them to use other events that are still being skipped.
// Use the first execution run ID from the start event as the initial seed. | ||
// First execution run ID stays the same for the entire chain of workflow resets. | ||
// This helps us keep child workflow IDs consistent up until the next reset point. | ||
weh.workflowInfo.childWorkflowIDSeed = attributes.GetFirstExecutionRunId() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can FirstExecutionRunID be empty ever? The current version of the server is guaranteed to include it in the start event. But could there be some long running WFs with this field being empty in start event?
Asking to see if we want to be defensive here and fall back to RunID if FirstExecutionRunID is empty.
What was changed
We now use
workflowInfo.OriginalRunID
in parent when it is generating a child workflow ID.Why?
We need this change to enable resetting of workflows that have pending children. What this means is Go SDK should be able to successfully replay
StartChildWorkflowExecutionInitiated
event in a parent that was reset.Without this change Go SDK will run into non-determinism error since the child ID it generates (based on new run ID) doesn't match with the ID available in the
StartChildWorkflowExecutionInitiated
event. SinceOriginalRunID
stays the same across resets, with this change the SDK can successfully replayStartChildWorkflowExecutionInitiated
events.Checklist
Closes
N/A
How was this tested:
Manually tested it (unit tests coming)
Any docs updates needed?
N/A