-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Order of task processing #319
Comments
What version of gearmand are you using?
What do you get if you only have one worker?
I think that's unlikely, but you could try submitting the jobs using something other than PHP to eliminate that as a possibility. |
gearmand 1.1.18-161-ga95d1c1 Im one version back for gearman, without a changelog its not clear of actual fixes, I would be surprised this has not been an issue affecting anyone else before so im hoping its a PECL issue. Ile write up a python client test script to rule out PECL being the issue. |
On Mon, Aug 23, 2021, 4:26 AM Ricardo ***@***.***> wrote:
I am reading that order that jobs are dequeued for processing is the order
they are submitted to server (single priority).
If you only ever use one priority, it's a FIFO, yes. But they're not
dequeued until they are marked complete by the worker. So what you mean is,
they are assigned in FIFO order.
Does this also apply to tasks which are run in parallel across multiple
workers?
I am queuing lots of Tasks and waiting for them to finish but they always
process in reverse to which they were added.
"Jobs" are FIFO, but if you're using foreground tasks, which is only a
concept in the client, not gearmand, it's possible they're added in reverse
order. They're not really an ordered concept. The idea is to add them all,
and wait for them all.
I know that there will be slightly different order based on processing
time but this just relates to general order they are given to workers
rather than completion.
Is this expected in the server or a problem caused by the PECL library
that should be submitting the jobs to gearmand?
Do not rely on this ordering. It's not guaranteed by the protocol and any
ordering you see is unintentional.
—
… You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#319>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADS6YCCHEQG6O6PTN6OQUTT6IV53ANCNFSM5CUL2SGA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
.
|
Ive checked and using a single client/worker the processing is exactly in the reverse order to which it was added to tasls. @SpamapS I do not want the order to be perfect but at least be roughly in the same order to which I added them, give or take. |
I think what @SpamapS is saying is that the order that jobs are assigned to workers is FIFO, but the order that the job results are returned to the client is unspecified and not guaranteed. You should not rely on that order being consistent. Regardless, opening an issue with php/pecl-networking-gearmand seems premature, imho. Here's what I would do to test this:
Do the above for N=1, 5, and 10. |
I added output results to PECL ticket I created. |
I don’t think it was clear from your initial description that you were submitting a batch of multiple tasks from a single client in one fell swoop. That’s not a typical Gearman use-case, I think. I think the typical use-case is having multiple clients submitting multiple jobs to the job server one job at a time. Possible workaround: Try submitting the jobs from the client (one at a time) as background tasks instead. Refer to https://www.php.net/manual/en/gearmanclient.dobackground.php for details. When you call |
The test is from a single client but in production this process can happen by multiple clients at the same time, but I am really only concerned about it when looking at a single client as what they do does not cross with other clients. PECL claims that they are just using I am not up with C so I can only go by what the plugin dev says. Workaround using background tasks is not ideal as it means tracking all the job handles and checking all of them are done etc. |
Yeah, it's more complicated and will take more PHP code, but I'm fairly sure it would work the way you want and it would be the most expedient solution, entirely within your control. The other workaround option is to just call |
On Wed, Sep 22, 2021, 8:24 AM Ricardo ***@***.***> wrote:
I added output results to PECL ticket I created.
It shows that the order of tasks (in a batch using runTasks) processed is
always in reverse to which they were added.
Ie if I add three tasks (1,2,3,4,5,6) then call runTasks, they will be run
on the workers in the order (6,5,4,3,2,1), if I add multiple workers the
order is still reversed but you can see the order change (6,4,5,2,3,1)
which varies with how quick it responds, in my situation many of the tasks
are roughly the same time to process.
I'm sorry this isn't clear from the docs, but the entire point of gearman,
*especially the task system*, is having multiple workers.
If you are not using multiple workers, you are basically wasting your time
with gearman and you should just use zeromq or grpc or a rest API and just
send requests directly to the server.
The point of tasks specifically is that a client has a bunch of independent
things that it needs done as quickly as possible. You send them all to
gearman at once and let the many workers do things concurrently, handling
the responses as they come in. If you have *any* order dependence, then you
need to wait for the responses before sending the dependent work.
Yes I know its a FIFO buffer, but the task batch is being submitted to the
… server in reverse order or the task array items are being popped off the
end of the array then sent instead of working from the start.
I do not expect perfect ordering but if I add 100 tasks I would like it
bear some resemblance of the order I submitted them in.
With smaller batches of 2/3/4 etc it does not matter but as I get to
50/100/200/500 then it does.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#319 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADS6YC5GOMCY5FG2DMAZDTUDHYLZANCNFSM5CUL2SGA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Ive been using it for nearly 10 years, I know how the workers/clients model works, we have a system with hundred workers doing image/video processing, various file transfer jobs and other background tasks. I do not have a hard order dependence on order, just it would be nice that when submitting 100 tasks as a batch that they come back in some sort of order, sure if I have 100 workers then the order will be anything at all, but if I have only 2/5 workers it would be good that it comes back in similar order to what it was queued up. I am not sure if something is being lost in translation, its 100% processed in reverse to how it was added as tasks before submitting the task batch. I am not saying to do any thing other than push the tasks to the server in the order that they were added in the first place when running the task batch so they are taken off the stack by workers in roughly the same order. I am aware processing time will mean that results can come back out of order.
|
On Thu, Sep 23, 2021 at 1:31 PM Ricardo ***@***.***> wrote:
Ive been using it for nearly 10 years, I know how the workers/clients
model works, we have a system with hundred workers doing image/video
processing, various file transfer jobs and other background tasks.
Wonderful, thanks for using Gearman. :)
I do not have a hard order dependence on order, just it would be nice that
when submitting 100 tasks as a batch that they come back in some sort of
order, sure if I have 100 workers then the order will be anything at all,
but if I have only 2/5 workers it would be good that it comes back in
similar order to what it was queued up.
I appreciate that, but can you maybe give us an objective target rather
than "it would be nice"? The task system has been architected in a very
particular way, and adding ordering requirements would make it even more
complex. When you add a task, gearman looks through the server list, and
picks a server based on a hash of the unique value you passed (if you
passed one, otherwise it is generated). This way, clients load balance
naturally among gearmand's, but if there are multiple clients with the same
unique request, both of them will send to the same gearmand, resulting in
both of them only needing to trigger the worker once*. This is *really*
important for use cases that rely on it, but it's not all that easy to do.
One way libgearman seems to have made it simpler during implementation (I
wasn't around for this) was that it was done as a stack of pointers to
tasks that have been assigned to a particular client connection. I
challenge you to go read the code and figure this out. Every 3 years
somebody asks a question and I have to re-familiarize myself with it.
That's one reason I want to replace gearmand and libgearman with a rust
implementation.. the C is just impossibly complex IMO.
Anyway, I respect that you have seen this effect. But I still have not seen
a problem statement that would warrant refactoring that part of the system.
I am not sure if something is being lost in translation, its 100%
processed in reverse to how it was added as tasks before submitting the
task batch. I am not saying to do any thing other than push the tasks to
the server in the order that they were added in the first place when
running the task batch so they are taken off the stack by workers in
roughly the same order. I am aware processing time will mean that results
can come back out of order.
client - addTask - 1
client - addTask - 2
...
client - addTask - 99
worker 1 - task 99
worker 2 - task 98
worker 1 - task 97
worker 1 - task 96
worker 2 - task 95
worker 3 - task 94
worker 2 - task 93
etc
instead of
worker 1 - task 1
worker 2 - task 2
worker 1 - task 3
worker 1 - task 4
worker 2 - task 5
worker 3 - task 6
etc
Nothing lost in translation, I'm just not really sure I see an actual
problem for this low level system to solve. You could solve it also by
adding tasks to your own stack, and just before you wait, add them to
libgearman in reverse order. That would result in them being submitted in
the original order.
*I wrote about it on my blog here:
https://fewbar.com/2020/04/gearman-is-misunderstood/
|
I agree with the original poster in that the jobs should be assigned to workers in the general order that they came in. And that's what we see (FIFO behavior) with background jobs on Gearman 1.1.18 . This order makes more sense when you have long queues that need hours to process. Here's our story: We use Gearman to process tens of thousands of recordings per day, but one day we had to process hundreds of thousands. We also use two priorities: high and low (and only one function). Because there was a flood of recordings, Gearman kept shipping the high-priority jobs to workers 1st. Because workers could not keep up, the low-priority jobs stacked up. All this is normal and expected. Once the flood ended, workers quickly picked up and finished any high-priority jobs and started work on the low-priority ones. At this point, there were let's say about 24 hours worth of (low priority) recordings/jobs waiting to be processed. From our |
Let's be really clear though: Jobs are assigned to workers in the order they are given to the server. However, the task system in libgearman is an abstraction above jobs, and sends these "tasks" as jobs. It sends them all at one time, and it happens to send them LIFO. This order isn't really defined in the docs. It only says that the task is added to the client structure, and ... well now .. I found a funny doc bug: Now, with that fixed, the way it is intended to work is that a bunch of tasks are added and then sent to the servers all at once. Making it FIFO would be a feature change and I'm not against it but it deserves a proper reason. If you want to send them in a particular order, you can now, just use Anyway, this isn't a bug, but I will leave it here as an incomplete enhancement request. If anyone wants to make clear what the purpose is, and write up the patch, it will of course be considered. |
I am reading that order that jobs are dequeued for processing is the order they are submitted to server (single priority).
Does this also apply to tasks which are run in parallel across multiple workers?
I am queuing lots of Tasks and waiting for them to finish but they always process in reverse to which they were added.
I know that there will be slightly different order based on processing time but this just relates to general order they are given to workers rather than completion.
Is this expected in the server or a problem caused by the PECL library that should be submitting the jobs to gearmand?
The text was updated successfully, but these errors were encountered: