Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add workflow_start_delay to StartChildWorkflowExecutionCommand #441

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

lina-temporal
Copy link
Contributor

@lina-temporal lina-temporal commented Aug 8, 2024

What changed?
API changes to support setting a workflow start delay for child workflow executions.

See #515

Why?

  • Workflow start delay is already implemented in the server, this change lets customers set it for child workflows, as well.

Breaking changes
None

Server PR

@lina-temporal lina-temporal requested review from a team as code owners August 8, 2024 18:42
@@ -226,6 +226,11 @@ message StartChildWorkflowExecutionCommandAttributes {
// If this is set, the child workflow inherits the Build ID of the parent. Otherwise, the assignment
// rules of the child's Task Queue will be used to independently assign a Build ID to it.
bool inherit_build_id = 17;
// Time to wait before dispatching the child workflow's first task. Cannot
// be used with `cron_schedule`. If the workflow gets a signal before the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think signal won't/shouldn't cause the workflow task to be dispatched.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signal on the parent workflow shouldn't cause it to be dispatched, but I'd expect a signal on the child workflow that's started with a delay to cause it to immediately start (same as existing WF logic), is that incorrect?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Child workflow is still a workflow, if it has a start delay, then signal will not trigger a workflow task to be dispatched thus unblocking the workflow.

Are you saying ^ is not the case? If so, it's a bug we need to fix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying ^ is not the case?

Yes, per StartWorkflowExecutionRequest.workflow_start_delay docs, the behavior is that a signal interrupts the delay

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the comment. If what the comment says is true, that needs to be fixed.

But I just tried this locally and signal won't interrupt the start delay.

➜ temporal (main) ✔ temporal workflow start --type TestWorkflow -w test-workflow-id -t test-task-queue --start-delay 1h
Running execution:
  WorkflowId  test-workflow-id
  RunId       c619050c-adf0-4ed3-acaa-d1b1f8607320
  Type        TestWorkflow
  Namespace   default
  TaskQueue   test-task-queue

➜ temporal (main) ✔ ./tctl wf show --wid test-workflow-id                      
  1  WorkflowExecutionStarted  {WorkflowType:{Name:TestWorkflow}, ParentInitiatedEventId:0,  
                                TaskQueue:{Name:test-task-queue, Kind:Normal},                
                                WorkflowExecutionTimeout:0s, WorkflowRunTimeout:0s,           
                                WorkflowTaskTimeout:10s, Initiator:Unspecified,               
                                OriginalExecutionRunId:c619050c-adf0-4ed3-acaa-d1b1f8607320,  
                                Identity:temporal-cli:[email protected],       
                                FirstExecutionRunId:c619050c-adf0-4ed3-acaa-d1b1f8607320,     
                                Attempt:1, FirstWorkflowTaskBackoff:1h0m0s,                   
                                ParentInitiatedEventVersion:0}                                
   
➜ temporal (main) ✔ ./tctl wf signal --wid test-workflow-id --rid c619050c-adf0-4ed3-acaa-d1b1f8607320 --name test-signal
Signal workflow succeeded.

➜ temporal (main) ✔ ./tctl wf show --wid test-workflow-id                                                              
  1  WorkflowExecutionStarted   {WorkflowType:{Name:TestWorkflow}, ParentInitiatedEventId:0,  
                                 TaskQueue:{Name:test-task-queue, Kind:Normal},                
                                 WorkflowExecutionTimeout:0s, WorkflowRunTimeout:0s,           
                                 WorkflowTaskTimeout:10s, Initiator:Unspecified,               
                                 OriginalExecutionRunId:c619050c-adf0-4ed3-acaa-d1b1f8607320,  
                                 Identity:temporal-cli:[email protected],       
                                 FirstExecutionRunId:c619050c-adf0-4ed3-acaa-d1b1f8607320,     
                                 Attempt:1, FirstWorkflowTaskBackoff:1h0m0s,                   
                                 ParentInitiatedEventVersion:0}                                
  2  WorkflowExecutionSignaled  {SignalName:test-signal,                                      
                                 Identity:[email protected]}           

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same results as you:


$ temporal workflow show -w parent-workflow_2d80ca74-86ff-43b1-a540-63e5e8d59e8d        
Progress:
  ID           Time                     Type           
    1  2024-08-14T22:24:33Z  WorkflowExecutionStarted  
$ temporal workflow describe -w parent-workflow_2d80ca74-86ff-43b1-a540-63e5e8d59e8d 
Execution Info:
  WorkflowId            parent-workflow_2d80ca74-86ff-43b1-a540-63e5e8d59e8d
  RunId                 a3560511-7678-44fb-a9a4-22a7cbea641d
  Type                  SampleParentWorkflow
  Namespace             default
  TaskQueue             child-workflow
  StartTime             2 minutes ago
  ExecutionTime         7 minutes from now
  StateTransitionCount  1
  HistoryLength         1
  HistorySize           257

Pending Activities: 0
Pending Child Workflows: 0
$ temporal workflow signal -w parent-workflow_2d80ca74-86ff-43b1-a540-63e5e8d59e8d --name "whatever-signal"
Signal workflow succeeded
$ temporal workflow show -w parent-workflow_2d80ca74-86ff-43b1-a540-63e5e8d59e8d                                                                                                                                           
Progress:
  ID           Time                     Type           
    1  2024-08-14T22:24:33Z  WorkflowExecutionStarted  
    2  2024-08-14T22:26:56Z  WorkflowExecutionSignaled 

$ temporal workflow describe -w parent-workflow_2d80ca74-86ff-43b1-a540-63e5e8d59e8d                                                                                                                                       
Execution Info:
  WorkflowId            parent-workflow_2d80ca74-86ff-43b1-a540-63e5e8d59e8d
  RunId                 a3560511-7678-44fb-a9a4-22a7cbea641d
  Type                  SampleParentWorkflow
  Namespace             default
  TaskQueue             child-workflow
  StartTime             2 minutes ago
  ExecutionTime         7 minutes from now
  StateTransitionCount  2
  HistoryLength         2
  HistorySize           347

Pending Activities: 0
Pending Child Workflows: 0

I'll update the comments on the existing API as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from my own curiosity, found where SignalWorkflow handles the backoff WFT

Copy link
Member

@cretz cretz Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. It appears the original form from temporalio/temporal#3984 didn't do this, and temporalio/temporal#4091 came a month later to change the behavior and the docs never got updated.

Many of our SDKs used these docs to write their own, e.g https://pkg.go.dev/go.temporal.io/sdk/internal#StartWorkflowOptions.StartDelay:

If the workflow gets a signal before the delay, a workflow task will be dispatched and the rest of the delay will be ignored.

And https://www.javadoc.io/doc/io.temporal/temporal-sdk/latest/io/temporal/client/WorkflowOptions.Builder.html and so on. So I will make a note to update those SDK docs too.

Copy link
Member

@yycptt yycptt Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual situation is more complex than this...

  • SignalWorkflowExecution will never unblock (basically ignoring skip_generate_workflow_task field in the request when workflow needs a delay).
  • SignalWithStart with skip_generate_workflow_task = true will not unblock. SignalWithStart with skip_generate_workflow_task = false will unblock.

I am thinking we should fix SignalWithStart behavior to match Signal. That's a bug we know about but not fixing for a long time so that we have a way to unblock workflow with delay (cron/retry/start_delay). But now that I think about it, we should just introduce a new tdbg command for unblocking the workflow.

Changing the behavior of SignalWithStart of server side could be regarded as breaking, but my understanding is that very few people is relying on that. We can add a feature flag in for the change, default to the new behavior (or old behavior, and new behavior in release X+1), and call out this change in release note.

Before we make ^ change, I think we should change the comment for all start delay fields to be the same as the one we have in the SignalWithStart request

// Time to wait before dispatching the first workflow task. Cannot be used with cron_schedule.
// Note that the signal will be delivered with the first workflow task. If the workflow gets
// another SignalWithStartWorkflow before the delay and skip_generate_workflow_task is false
// or not set, a workflow task will be dispatched immediately and the rest of the delay period
// will be ignored, even if that request also had a delay. Signal via SignalWorkflowExecution
// will not unblock the workflow.

Copy link
Member

@yycptt yycptt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking this PR for now as there's a blocker from replication side for server implementation.

Check temporalio/temporal#6401 (comment) for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants