Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make update caller receive update failed error on workflow cancellation #653

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

dandavison
Copy link
Contributor

Fixes #575

Today, if a workflow terminates via cancellation with an in-progress update, the update caller sees a gRPC NOT FOUND error (because the server aborts the update after the workflow is closed).

This PR causes the caller to see WorkflowUpdateFailedError, which is what the caller sees in this situation with all other Temporal SDKs.

It does so by cancelling the asyncio Task representing the update handler execution; the resulting exception in the update handler coroutine causes the update to fail.

How this was tested

  • Integration test in the PR

@dandavison dandavison force-pushed the sdk-2453-cancellation branch 2 times, most recently from 1af52e4 to 6b4d5ca Compare September 27, 2024 11:55
@dandavison dandavison marked this pull request as ready for review September 27, 2024 11:55
@dandavison dandavison requested a review from a team as a code owner September 27, 2024 11:55
)
self._in_progress_updates[
job.id
].unfinished_policy = defn.unfinished_policy
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are now forced to create the _in_progress_updates entry before this point (when we create the task), but we update the policy from the defn here, since defn is not known until handler coroutine execution time.

task,
temporalio.workflow.HandlerUnfinishedPolicy.WARN_AND_ABANDON,
job.id,
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are now storing Task in the HandlerExecution dataclass, in order to get hold of unfinished update handler tasks for cancellation.

task.add_done_callback(done_callback)
self._in_progress_signals[id] = HandlerExecution(
job.signal_name, task, defn.unfinished_policy
)
Copy link
Contributor Author

@dandavison dandavison Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't actually do anything with the signal tasks currently, but we add the Task to the HandlerExecution for consistency with update.

@@ -1845,6 +1856,14 @@ async def _run_top_level_workflow_function(self, coro: Awaitable[None]) -> None:
err
):
self._add_command().cancel_workflow_execution.SetInParent()
self._warn_if_unfinished_handlers()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to maintain our rules regarding unfinished handler warnings, we are forced to issue the warnings here in the case of cancellation (for the other workflow termination types, we continue to do it at the end of activate())

@@ -1845,6 +1856,14 @@ async def _run_top_level_workflow_function(self, coro: Awaitable[None]) -> None:
err
):
self._add_command().cancel_workflow_execution.SetInParent()
self._warn_if_unfinished_handlers()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should warn before sending the cancellation. A user should have the opportunity to receive and react to cancellation without warning.

Comment on lines 439 to 441
# We do also warn in the case of workflow cancellation, but this is done
# when handling the workflow cancellation, since we also cancel update
# handlers at that time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it should be at workflow completion time where we warn always, regardless of reason for completion. This includes cancellation. Users should ensure that their handlers complete even during cancellation, but that doesn't need to be checked until the actual workflow completion is being recorded (primary workflow method could be swallowing cancellation, throwing some other error, etc).

)


async def test_update_cancellation(client: Client):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be worth having a test for doing a cleanup step on cancellation (e.g. in a finally). Can do this in the update or the workflow. This is common in Temporal code and I fear cleanup code in an update with issue a warning.

Copy link
Contributor Author

@dandavison dandavison Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, good point. So to restate what you're saying:

If a workflow cancellation request comes in, and a handler is unfinished, the workflow author should be able to allow the workflow to be cancelled, and to have it gracefully terminate the handler, and not to be criticized by any warning.

That is true today: we don't cancel signals or updates on workflow cancellation, but a user could write a workflow that does, and they would escape the warning, because the handler would not be alive at check time.

But my PR breaks it, by attempting to make the check at workflow cancellation time: it would classify a handler as unfinished when in fact it's about to be gracefully terminated.

The reason my PR does that is that it wanted to continue to warn the workflow author when workflow cancellation causes an update cancellation. And the reason it wants to do that is to maintain consistency with signals: when the workflow is canceled, if we're going to warn on unfinished signal handlers (which we do) then we should warn on unfinished update handlers also.

I think that these desires are mutually incompatible. We have to allow a user to clean up their update gracefully without the warning, as you point out. I see two solutions:

  1. to change things so that we no longer issue unfinished handler warnings for either signal or update, in the case of workflow cancellation.

  2. to automatically cancel signals also (with a workflow logic flag)

There is a strong incentive (caller UX) for automatically canceling update handlers on workflow cancellation. We have never automatically cancelled signal handlers in the Python SDK, and doing so would require a flag as it may be backwards incompatible with replay of existing histories. The inconsistency in cancelling update handlers but not signal handlers would only be a problem if there is a potential for observable side effects.

I'll carry on thinking about it but for now I've updated the PR to stop issuing the unfinished warning on workflow cancellation, for both signal and update.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to change things so that we no longer issue unfinished handler warnings for either signal or update, in the case of workflow cancellation.
[...]
I'll carry on thinking about it but for now I've updated the PR to stop issuing the unfinished warning on workflow cancellation, for both signal and update.

I think we should warn on unfinished handlers upon workflow completion regardless of reason for completion and regardless of reason handlers are unfinished. I think the problem with the PR before was it was warning before cancellation is processed and I think the previous behavior of warning after cancellation/completion is processed is best. So I think the line at #653 (comment) should not be removed.

It does mean it's a pain for users doing "cleanup" in update handlers, but IMO that's an education issue. We have to educate them that they should not exit the primary run function until they have done cleanup in handlers. This is similar to process exiting before an async thing has been given a chance to clean up.

Copy link
Contributor Author

@dandavison dandavison Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're suggesting that we continue to do the check late, and warn on workflow cancellation.

We can't do that, because in the common case we will then warn on unfinished signals (since these are not automatically cancelled, and hence the handlers will still be alive) but not warn on unfinished updates (since these are automatically cancelled, and hence the handlers will be dead).

(I did try to explain that in the comment above #653 (comment), but it is turning out to be a slightly tricky subject! And my comment isn't very clear).

This is why I've switched the PR to no longer warn on cancellation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm from our off-PR discussion, we are still warning on unfinished handler regardless of reason (i.e. cancellation)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, still thinking about the design here. I've put the PR into draft mode for now.

@dandavison dandavison marked this pull request as draft October 1, 2024 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Workflow cancellation should cancel handler executions
2 participants