-
-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use proper fields for finding prunable jobs #1145
Conversation
Using "scheduled_at" is not correct. Depending on job state, one of "cancelled_at", "discarded_at" or "completed_at" should be used.
lib/oban/engines/basic.ex
Outdated
@@ -153,9 +153,10 @@ defmodule Oban.Engines.Basic do | |||
subquery = | |||
queryable | |||
|> select([:id, :queue, :state]) | |||
|> where([j], j.state in ~w(completed cancelled discarded)) | |||
|> where([j], j.state == "completed" and j.completed_at < ^time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We switched to the scheduled_at
timestamp to make use of the compound index and speed up pruning. There are likely to be many more completed
jobs than cancelled
or discarded
, and a completed
job has to execute to end up in that state. So, let's use scheduled_at
for the completed
state to make use of that index.
|> where([j], j.state == "completed" and j.completed_at < ^time) | |
|> where([j], j.state == "completed" and j.scheduled_at < ^time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes perfect sense performance-wise
If you do not do some tricky stuff, completed, cancelled and discarded rows never change. Probably they could all use a single timestamp field instead of 3, with a compound index on state and that field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this redundant? queue column is not nullable
where([j], not is_nil(j.queue))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The is_nil
check is pointless with the current index structure, but it was required to force index usage with the old compound table where queue
was before state
. The index change was optional (way back when), and some people never chose that option.
We can probably remove it at this point, but it doesn't change the query plan.
Thanks for submitting this improvement! |
Would you consider this?
May be a bit of work and require a heavy migration though |
In theory, yes. It would require a few migrations for zero downtime and coordinated changes between packages. That's a lot and I'm not convinced it's worth it. |
Using "scheduled_at" is not correct. Depending on job state, one of "cancelled_at", "discarded_at" or "completed_at" should be used.