New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add state management and queuing #1

Open

4 tasks

KastanDay opened this issue Oct 11, 2023 · 0 comments

Member

KastanDay commented Oct 11, 2023 •

edited by zhourrr

Loading

State management:

Keep track of which model(s) is in memory to help with advanced batching (NOT pure FIFO)
Prioritization?

Queuing

Inference queue
Advanced batching -- when the queue contains separate requests for the same model, batch them and run all jobs requesting that model before moving onto the next model (with a max of 15-20 minutes with any one model in memory, if we have other jobs waiting in the queue. This should balance efficiency, i.e. batching, with fairness, i.e. FIFO queuing).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment