Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'tdbg workflow unblock' subcommand #6410

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

lina-temporal
Copy link
Contributor

What changed?

The tdbg subcommand workflow unblock was added, alongside with all of the machinery to add it to AdminService/HistoryService's internal protos.

Why?

As part of development of the workflow start delay support for child workflows, @yycptt mentioned it would be useful to have a tool to immediately unblock blocked workflows. Particularly since we're working in this area and may change the behavior of existing block tasks, it seems useful to have an oncall tool ready.

How did you test it?

  • New test for history_engine
  • Ran it manually with a workflow that set a 10 minute start delay:
$ ./tdbg w show -wid parent-workflow_e6ffb667-8afa-43e2-88be-e5957207f359 -rid ee5919a6-8c0d-4ad2-b6dd-ae078af1f57c   
======== batch 1, blob len: 256 ======
[{"eventId":"1","eventTime":"2024-08-16T23:33:08.065732Z","eventType":"EVENT_TYPE_WORKFLOW_EXECUTION_STARTED","taskId":"19922948","workflowExecutionStartedEventAttributes":{"workflowType":{"name":"SampleParentWorkflow"},"taskQueue":{"name":"child-workflow","kind":"TASK_QUEUE_KIND_NORMAL"},"workflowExecutionTimeout":"0s","workflowRunTimeout":"0s","workflowTaskTimeout":"30s","originalExecutionRunId":"ee5919a6-8c0d-4ad2-b6dd-ae078af1f57c","identity":"[email protected]@","firstExecutionRunId":"ee5919a6-8c0d-4ad2-b6dd-ae078af1f57c","attempt":1,"firstWorkflowTaskBackoff":"600s","header":{},"workflowId":"parent-workflow_e6ffb667-8afa-43e2-88be-e5957207f359"}}]
======== total batches 1, total blob len: 256 ======

$ ./tdbg w unblock -wid parent-workflow_e6ffb667-8afa-43e2-88be-e5957207f359 -rid ee5919a6-8c0d-4ad2-b6dd-ae078af1f57c   
Namespace: default WorkflowID: parent-workflow_e6ffb667-8afa-43e2-88be-e5957207f359 RunID: ee5919a6-8c0d-4ad2-b6dd-ae078af1f57c
Immediately unblock workflow? [y/N]: y
Workflow execution unblocked.

$ ./tdbg w show -wid parent-workflow_e6ffb667-8afa-43e2-88be-e5957207f359 -rid ee5919a6-8c0d-4ad2-b6dd-ae078af1f57c   
======== batch 1, blob len: 256 ======
[{"eventId":"1","eventTime":"2024-08-16T23:33:08.065732Z","eventType":"EVENT_TYPE_WORKFLOW_EXECUTION_STARTED","taskId":"19922948","workflowExecutionStartedEventAttributes":{"workflowType":{"name":"SampleParentWorkflow"},"taskQueue":{"name":"child-workflow","kind":"TASK_QUEUE_KIND_NORMAL"},"workflowExecutionTimeout":"0s","workflowRunTimeout":"0s","workflowTaskTimeout":"30s","originalExecutionRunId":"ee5919a6-8c0d-4ad2-b6dd-ae078af1f57c","identity":"[email protected]@","firstExecutionRunId":"ee5919a6-8c0d-4ad2-b6dd-ae078af1f57c","attempt":1,"firstWorkflowTaskBackoff":"600s","header":{},"workflowId":"parent-workflow_e6ffb667-8afa-43e2-88be-e5957207f359"}}]
======== batch 2, blob len: 53 ======
[{"eventId":"2","eventTime":"2024-08-16T23:34:12.959775Z","eventType":"EVENT_TYPE_WORKFLOW_TASK_SCHEDULED","taskId":"19922953","workflowTaskScheduledEventAttributes":{"taskQueue":{"name":"child-workflow","kind":"TASK_QUEUE_KIND_NORMAL"},"startToCloseTimeout":"30s","attempt":1}}]
======== batch 3, blob len: 139 ======
[{"eventId":"3","eventTime":"2024-08-16T23:34:12.978013Z","eventType":"EVENT_TYPE_WORKFLOW_TASK_STARTED","taskId":"19922956","workflowTaskStartedEventAttributes":{"scheduledEventId":"2","identity":"[email protected]@","requestId":"cc1aa23f-9e53-4c3b-bffd-dde947c0f986","historySizeBytes":"309","workerVersion":{"buildId":"6158a7b6c1586d4c4546cd6a0c0ea6e7"}}}]
======== batch 4, blob len: 307 ======
** SNIP **

$ ./tdbg w unblock -wid parent-workflow_e6ffb667-8afa-43e2-88be-e5957207f359 -rid ee5919a6-8c0d-4ad2-b6dd-ae078af1f57c   
Namespace: default WorkflowID: parent-workflow_e6ffb667-8afa-43e2-88be-e5957207f359 RunID: ee5919a6-8c0d-4ad2-b6dd-ae078af1f57c
Immediately unblock workflow? [y/N]: y
Error: Unable to unblock workflow execution: rpc error: code = InvalidArgument desc = workflow `{b9377030-2e97-4cc2-a990-4bb6b743d170 parent-workflow_e6ffb667-8afa-43e2-88be-e5957207f359 ee5919a6-8c0d-4ad2-b6dd-ae078af1f57c}` isn't blocked on a backoff task
('export TEMPORAL_CLI_SHOW_STACKS=1' to see stack traces)
Screenshot 2024-08-16 at 4 36 32 PM

Screenshot showing the workflow task being scheduled after unblock workflow, notice the First Workflow Task Backoff field on the first event compared to following event timestamps

Potential risks

  • None, new command

Documentation

  • I'm not sure if this is applicable to any existing runbooks

Is hotfix candidate?

  • No

@lina-temporal lina-temporal requested a review from a team as a code owner August 16, 2024 23:38
@@ -276,6 +276,10 @@ service HistoryService {
rpc RefreshWorkflowTasks(RefreshWorkflowTasksRequest) returns (RefreshWorkflowTasksResponse) {
}

// UnblockWorkflowExecution immediately unblocks a blocked/delayed workflow.
rpc UnblockWorkflowExecution(UnblockWorkflowExecutionRequest) returns (UnblockWorkflowExecutionResponse) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already a ScheduleWorkflowTask rpc defined for history service, let's see if that can be reused. May need to introduce some new fields in the request.

@@ -198,6 +198,26 @@ func newAdminWorkflowCommands(clientFactory ClientFactory, prompterFactory Promp
return AdminRefreshWorkflowTasks(c, clientFactory)
},
},
{
Name: "unblock",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't call this unblock as it's too ambiguous what "unblock" cloud mean here. Workflow could be blocked waiting for a activity to be started, a child workflow to completed etc.
I think it makes sense to make the name specific here and make it clear that it's scheduling a workflow task regardless of delays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants