-
Notifications
You must be signed in to change notification settings - Fork 531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SERVE][AUTOSCALERS] Replica scaling sampling period and stability. #4444
Comments
cc'ing @cblmemo |
Hi @JGSweets ! Thanks for reporting this. I think the first solution makes sense! Could you elaborate on the second one? |
Error may have been a poor choice in wording. I meant that it hedges towards scaling up since it rounds up with any float. This means qps/replica drops significantly if it is barely breaching the requirement. This may be ideal behavior for some. Maybe this was chosen because if there's any qps, you want to guarantee scaling to 1? |
That is one reason, but the primary reason is that we dont want any overload on the service. |
Would also be convenient if this was a setting where we had a choice. Similar to AWS alarms and triggers. |
In autoscalers.py within serve:
skypilot/sky/serve/autoscalers.py
Lines 258 to 269 in 3f62588
When a single qps check is below or above the threshold, the
downscale_counter
orupscale_counter
is set to 0.This means a single jitter in qps could disrupt scaling.
I propose we allow a sampling over a period to allow scaling to occur based on a percentage of occurrences vs resetting to 0.
This could be set in the scaling policy.
Also, since scaling utilizes
math.ceil
, it errors on scaling and keeping qps below the value as a max bar vs a target.skypilot/sky/serve/autoscalers.py
Line 192 in 3f62588
Version & Commit info:
sky -v
: 0.7.0sky -c
: 3f62588The text was updated successfully, but these errors were encountered: