Update worker autoscaling analyses priority. #1148

sambles · 2024-11-27T15:14:24Z

Issue Description

In the current auto-scaler logic, ModelStates are aggregations of an OasisModel and its queued/running analyses

class ModelState(TypedDict):
    """ 
    Used in the model states dict to store information about each models current states. For now number of tasks
    and analyses for each model.
    """
    tasks: int 
    analyses: int 
    priority: int

In the code currently, this grouping will inherit the highest priority from all the of analyses.

OasisPlatform/kubernetes/worker-controller/src/autoscaler.py

Lines 82 to 83 in 21693fb

    
           if priority > model_state['priority']: 
        
               model_state['priority'] = priority

Testing required: This might lead to starvation of other model workers, given that all queues share the same pool of nodes to draw VMs from.

A suggested improvement, is to scale based on the number of "slots" or number of concurrent task threads a worker can process.

e.g. If a task is (priority=10, tasks=15) then 15 slots are assigned that priority, rather than all

The text was updated successfully, but these errors were encountered:

sambles · 2024-11-27T15:36:21Z

Sam: Thinking about it again, shouldn't the model_state aggregate priority value drop onces the high pri task has competed?

Amir: Yes it does go down but before the highest priority analysis finishes, all other low priority analyses of the same model will also effectively get highest priority (because the whole model gets highest priority) and those low priority ones will block higher priority analyses from other models. This will last until the highest priority analyses ends then the situation gets rectified. But they could run for many hours and that will block an analysis with higher priority from another model.

which is not desirable. We should have workers lined up according to priority of analyses regardless of model

awsbuild added this to Oasis Dev Team Tasks Nov 27, 2024

benhayes21 assigned sambles Jan 14, 2025

benhayes21 moved this to Todo in Oasis Dev Team Tasks Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update worker autoscaling analyses priority. #1148

Update worker autoscaling analyses priority. #1148

sambles commented Nov 27, 2024 •

edited

Loading

sambles commented Nov 27, 2024

Update worker autoscaling analyses priority. #1148

Update worker autoscaling analyses priority. #1148

Comments

sambles commented Nov 27, 2024 • edited Loading

Issue Description

sambles commented Nov 27, 2024

sambles commented Nov 27, 2024 •

edited

Loading