About scheduler #26

luzai · 2023-12-03T07:21:27Z

Thank you for your great work! May I ask about some details on the scheduler?

In paper, it is mentioned that "To minimize latency penalty, we limit the prefill batch size to 1 for each batch." So if multiple requests are at prefill stage, they will either be scheduled to different Runners or be in the first-arrive-first service queue. Is this understanding correct? By the way, May I know whether this scheduling code (for section 5.1 Scheduling new request) is released?
In figure 2, May I know the difference between runner and LLMs under a runner?

Looking forward to hearing from you~

luciferlinx101 · 2023-12-05T00:57:31Z

Yeah even I am also looking for the same!

jjjjohnson · 2023-12-07T10:08:02Z

Look like there is no implementation for scheculer in this repo?

Provide feedback