You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Keep track of which model(s) is in memory to help with advanced batching (NOT pure FIFO)
Prioritization?
Queuing
Inference queue
Advanced batching -- when the queue contains separate requests for the same model, batch them and run all jobs requesting that model before moving onto the next model (with a max of 15-20 minutes with any one model in memory, if we have other jobs waiting in the queue. This should balance efficiency, i.e. batching, with fairness, i.e. FIFO queuing).
The text was updated successfully, but these errors were encountered:
State management:
Queuing
The text was updated successfully, but these errors were encountered: