Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to implement FIFO stack / job re-prioritization? #395

Open
dmeziere opened this issue Jul 9, 2024 · 18 comments
Open

Is there a way to implement FIFO stack / job re-prioritization? #395

dmeziere opened this issue Jul 9, 2024 · 18 comments

Comments

@dmeziere
Copy link

dmeziere commented Jul 9, 2024

Hello,

I use Gearman to drive an ETL farm. I think it is the perfect solution, and you did a really great job, but i've got one need not covered.

  • 3 gearman-job-server/jammy,now 1.1.19.1+ds-2build1 amd64
  • 3 php 8.0 Gearman clients
  • 35 php 8.0 Gearman workers

We are lacking processing power, and therefore workers, so there is frequent trafic jam. Each hour, our client declares a thousand jobs, but the queue is not always terminated. Gearman seems to work as a LIFO stack. So we always have a few jobs that are delayed again and again, by newer jobs being declared. And that numbers grows from hour to hour until low trafic hour or a crash (not on Gearman side, it is rock-solid).

Is there a way to use Gearman as a FIFO stack, or to repriorize existing jobs before adding new ones ? Here is what i mean :

  • By FIFO stack i mean that newly added jobs will only be handled after existing ones

  • By repriorizing existing jobs i mean that each hour, when our client starts, it could first make all existing jobs as high priority, then add new jobs as normal priority, emulating a FIFO stack

  • or anything else that could solve my problem

@esabol
Copy link
Member

esabol commented Jul 9, 2024

Jobs are assigned to workers in the order they are given to the server (FIFO). However, the task system in libgearman as used by PHP clients is an abstraction above jobs, and it sends these "tasks" as jobs. It sends them all at one time, and it happens to send them LIFO. Refer to the discussion in issue #319.

Basically, if you change how you submit the tasks/jobs in your clients (hint: use doBackground), you should get FIFO.

Alternatively, you are welcome to contribute a PR which changes the order that tasks are added in libgearman to be FIFO.

I also think you need to add more workers until the rate of jobs you can complete exceeds the rate of jobs that you add. Try doubling or tripling the number of workers you have.

@esabol esabol changed the title Is there a way to implement FIFO stack / job re-priorization ? Is there a way to implement FIFO stack / job re-prioritization? Jul 9, 2024
@esabol esabol added the question label Jul 9, 2024
@dmeziere
Copy link
Author

dmeziere commented Jul 9, 2024

Thank you for this track to explore. Adding more workers, in my case, means adding more physical servers (i already pushed the number of process per machine to a confortable ratio), and the costs will explode. That said, if i can prevent a not-yet-ended import to be pushed again and again by incoming ones, it will be a major upgrade !

When i say "35 PHP workers", i was meaning 7 physical servers each hosting 5 VM using each 3 worker processes.

@SpamapS
Copy link
Member

SpamapS commented Jul 9, 2024 via email

@dmeziere
Copy link
Author

dmeziere commented Jul 10, 2024

Does doBackground() have other behavioural differences with addTask() / runTasks() ? I mean the jobs are executed, but gearadmin can't see them, and it looks like the callbacks are not executed. I use them a lot to generate a Gantt diagram, showing all the jobs in realtime. There nothing works at the monitoring level.

[edit] I now can see the job with gearadmin. The documentation (that is a bit light to my taste) states that all the callbacks handling only works with runTasks(). I really need this behaviour, it is a problem to me.

@esabol
Copy link
Member

esabol commented Jul 10, 2024

Just to be clear, the PHP extension is a separate project, and we are not responsible for it (except that it uses libgearman.so under the hood and we are responsible for that). If doBackground does not fit your needs, you are welcome to submit a PR which changes the behavior of libgearman, as mentioned previously.

@SpamapS
Copy link
Member

SpamapS commented Jul 10, 2024 via email

@dmeziere
Copy link
Author

@esabol I am not blaming anyone or anything. I love Gearman ! I am just trying to understand and locate where my problem is, and to find the cheapest solution to it. Believe me, if i could provide any quality code in C / Boost, i would be proud to contribute, if it was nessessary. The only thing i said is that the Gearman documentation on the PHP website (that i understand is not gearmand related) could be enhanced.

@dmeziere
Copy link
Author

@SpamapS I am not using GearmanClient::do. I experienced it this week thanks to your help on this issue, but i did not go very far because i use a lot the callbacks and communication provided by GeamanClient tasks to manage my jobs. I achieved running my jobs with GearmanClient::do, but without any feedback of course.
I have a second method, also, but it is nominative (one method per "workshop" (a group of workers handled by a PHP master process in my project) used to warmup an import, before running the real jobs, that have the same function name for all the farm.

@SpamapS
Copy link
Member

SpamapS commented Jul 12, 2024 via email

@esabol
Copy link
Member

esabol commented Jul 22, 2024

I'm kind of wondering if this is actually a problem with the PHP extension after all. The implementation for the addTask method in the PHP extension has a comment that says "prepend task to list of tasks on client obj", which would seem to imply that it's the one that's setting the order to LIFO instead of FIFO.

gearman_client_add_task_handler (https://www.php.net/manual/en/gearmanclient.addtask.php):
https://github.com/php/pecl-networking-gearman/blob/a52052cdd712a95091ce926be3bcdca41c730696/php_gearman_client.c#L736

@SpamapS
Copy link
Member

SpamapS commented Jul 28, 2024

No, that's a bit of a ruse, that's just how it's managing its own data structures. It happens here:

https://github.com/gearman/gearmand/blob/master/libgearman/packet.cc#L190-L199

Tasks are stored in the universal here until run_tasks is run. For whatever reason, they decided to prepend rather than append. As we've said before, if you want to use tasks FIFO, then you have to add them in reverse order.

The docs don't define this order, but I don't think we could change it without most likely breaking some folks.

We could probably add a new universal option to reverse the order, and if nothing else, maybe we should document that they are LIFO.

@esabol
Copy link
Member

esabol commented Jul 29, 2024

I think undocumented behavior is subject to change, personally, and I really doubt anyone wants LIFO. Just my two cents.

@dmeziere
Copy link
Author

If I may add weight to the FIFO behaviour, the problem is not when one adds a bunch on jobs in an empty queue. He can, like previously said, reverse the order of submition if desired. But when one adds a bunch of jobs on an already filled queue, the oldest jobs will be pushed back by the new ones. And if the same thing appends many times, the oldest jobs will never be handled. Please excuse me if i'm not clear, my english may be deficient.

@SpamapS
Copy link
Member

SpamapS commented Jul 29, 2024 via email

@SpamapS
Copy link
Member

SpamapS commented Jul 30, 2024 via email

@esabol
Copy link
Member

esabol commented Jul 30, 2024

But when one adds a bunch of jobs on an already filled queue, the oldest jobs will be pushed back by the new ones. And if the same thing appends many times, the oldest jobs will never be handled. Please excuse me if i'm not clear, my english may be deficient.

Just to be clear, we don't believe that is true. Once the jobs are in gearmand's queue, all jobs are processed in FIFO order. It's addTask/runTasks that submits the tasks to gearmand in LIFO order. If you submit each task to gearmand as separate jobs using PHP's doBackground or doNormal, I think you would see that.

If your experience is different, please provide a simple reproducible test case that submits a bunch of jobs with simple payloads like "job N" and have the workers return the job payload appended with timestamps of when they are processed by the workers.

@dmeziere
Copy link
Author

It's complicated. I am alone on the project, totally overloaded, and my usage of Gearman is far from simple. Aside the development, i also handle all the server infrastructure (75 hosts). And summer is the only period when i can migrate all the solutions we use to their latest versions without disturbing our customers. I will try to find that time, but it is a tough period for me.

@SpamapS
Copy link
Member

SpamapS commented Jul 30, 2024

Please give --round-robin a try on your gearmand. If that doesn't fix it, then yes, if you can extract just the gearman bits of your PHP out and paste here, we can confirm if it is the library doing LIFO with tasks as we've been talking about, or something else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants