Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement WorkflowStartDelay for StartChildWorkflowExecutionCommand #6401

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

lina-temporal
Copy link
Contributor

@lina-temporal lina-temporal commented Aug 13, 2024

  • Implement WorkflowStartDelay for StartChildWorkflowExecutionCommand

What changed?

Implemented service support for the WorkflowStartDelay field. See API PR.

Why?

How did you test it?

  • go test -v "go.temporal.io/server/tests" -testify.m TestWorkflowStartDelayChildWorkflowExecution
  • Updated unit tests, make unit-test

Potential risks

Documentation

  • Docs in the API are updated

Is hotfix candidate?

  • No

Copy link
Member

@yycptt yycptt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This will break the standby task logic which tries to verify if the first workflow task is scheduled for the child workflow or not. I think we'll need to extend the VerifyFirstWorkflowTaskScheduled api implementation to also check if a workflow backoff timer is created.

  • We also will need to introduce a feature flag for this I think. Otherwise during upgrade, event can be created with a delay but when processing the task, due to shard movement, the processing can happen on an old host which doesn't know about this field. Similar thing could also when there are multiple clusters and failover I think.

@yycptt yycptt requested a review from yux0 August 14, 2024 21:37
@yycptt
Copy link
Member

yycptt commented Aug 23, 2024

  • This will break the standby task logic which tries to verify if the first workflow task is scheduled for the child workflow or not. I think we'll need to extend the VerifyFirstWorkflowTaskScheduled api implementation to also check if a workflow backoff timer is created.
  • We also will need to introduce a feature flag for this I think. Otherwise during upgrade, event can be created with a delay but when processing the task, due to shard movement, the processing can happen on an old host which doesn't know about this field. Similar thing could also when there are multiple clusters and failover I think.

Update from offline discussion. The ^ two issues are easy to resolve we can add a new field in mutable state to denote if a workflow task backoff timer is created or not.

However, today's replication stack is event based and unable to replicate the fact that a workflow task backoff timer is scheduled in the active cluster. As a result, standby cluster has no way to verify that the startChildWorkflow transfer task is fully completed in the active cluster without making a cross cluster call.

The new state-based replication or start child improvement project will unblock this.

Or we have to make cross cluster call to the active cluster to check if the backoff timer is scheduled or not. If so, create the same task in standby cluster as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants