Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed bug: Lost DB connection causes jobs to not be processed anymore. #163

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

thorsteneckel
Copy link

Hi there lovely maintainers,

first of all: Thanks for this great gem! It does a great job over at zammad. However, we faced an issue in one of our customers installations. Long story short: The DB connection socket is closed (due to a restarting DB) while Delayed::Job reserves the job. Delayed::Job rescues the raised exception, logs it as INFO(!) and calls the recover_from method on the backend and then may try to re-run the job or skips it. An example log entry looks like this:

I, [2018-11-12T16:56:33.875406 #4559] INFO -- : 2018-11-12T16:56:33+0100: [Worker(host:some_name pid:1337)] Error while reserving job: PG::ConnectionBad: PQsocket() can't get socket descriptor: UPDATE "delayed_jobs" SET locked_at = '2018-11-12 15:56:33.873572', locked_by = 'host:some_name pid:1337' WHERE id IN (SELECT "delayed_jobs"."id" FROM "delayed_jobs" WHERE ((run_at <= '2018-11-12 15:56:33.872595' AND (locked_at IS NULL OR locked_at < '2018-11-12 11:56:33.872665') OR locked_by = 'host:some_name pid:1337') AND failed_at IS NULL) ORDER BY priority ASC, run_at ASC LIMIT 1 FOR UPDATE) RETURNING *

This gem does not utilize the Delayed::Job recover_from callback yet. This PR changes that to make sure the DB connection is still present after any exception was raised while processing a job. If the DB connection is lost and can't be reestablished a new exception will be raised and have to be handled accordingly.

Sadly I have no clue how to provide tests for this. I tried my best but haven't found out how. Please let me know how you would approach it and I'm happy to add those.

Greetings from Germany 👋

@thorsteneckel
Copy link
Author

The failing TravisCI jobs have different cause than the changes I introduced. Let me know if/how I can help to get this merged. Thanks!

sauy7 added a commit to fishbrain/delayed_job_active_record that referenced this pull request Sep 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant