Possible Duplicate Job Execution in Cluster After DB Connection loss and resume #585
-
Wondering what is the behavior of db-scheduler in this scenario
|
Beta Was this translation helpful? Give feedback.
Answered by
kagkarlsson
Dec 18, 2024
Replies: 1 comment 3 replies
-
It is a bit unlikely that Node A just resumes working. That it is idling just waiting for a healthy connection. You can choose to not allow reviving dead executions, or define some special handler for them by overriding |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
If you loose the db-connection to such a degree that heartbeating also stops (which runs in another thread on another db-connection), then most likely the ongoing transaction will never commit, since the app probably need to re-establish all database-connections.
There are scenarios where it still might happen, like stop-the-world events (gc or similar) with a duration longer than the limit for dead-executions discovery. You could in theory get the execution from the database before committing and checking that the version number is unchanged. I don't know of anyone doing this though (it is I think overly paranoid), but you could
(the version-number is incremented on most updates on the e…