-
-
Notifications
You must be signed in to change notification settings - Fork 517
[14.0][IMP] queue_job: cancel job before it retries #773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 14.0
Are you sure you want to change the base?
Conversation
Hi @guewen, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced we should do this.
From a UX perspective this is not cancelling the job, it merely prevents it to be retried. Could we have something doing that explicitly? Such as limiting the retry count?
As a side note, I still think the concept of canceling job is unclear. In my mind, a job is conceptually part of the transaction that created it and we expect it to be eventually completed. That is why there was also a Set Done button initially.
Also, last time I checked it did not work with job graphs.
I understand that it is not perfect solution, however it helps in real situations. Using action Cancel is a last resort solution when there's some Production issue. It is not so important that the job will finish running (trying) before it is really cancelled. Moreover, one of the case which is tackled here does not comply with the In my opinion this button Cancel should be convenient for the system admin which needs some tool to unblock Production. Then when system load goes down, then Administrator can find easily the "cancelled" jobs and use button "Requeue" in order to process them, one at a time. |
But if I understand correctly, in your use case, you could cancel a running job, and it could still end up in done state? Is that not counter intuitive?
I did not know these were retried indefinitely. Is that intended? Shouldn't we address that instead? |
Yes, if the job is just started, it can finish successfully, even if operator presses button to cancel.
Regarding the infinite retry, code is here: Maybe it's legit to keep retrying, if the database load decreases later, and it can terminate successfully. I don't know. |
TIL. I have been catching SerializationFailure and rethrowing RetryableJobError myself for years! I have never seen a job actually make it all the way to "infinite" retries. Even my most congested jobs running in tens of thousands with a couple dozen workers in parallel still resolve after my Alternate perspective: Would it make more sense to have the job run out of retries and fail (with some sort of notification), then an admin could restart the job later, rather than have to recover it in real time? |
dc2571c
to
ca3e0dd
Compare
ca3e0dd
to
b7bcdc3
Compare
Use case:
This patch allows to Cancel running jobs: