Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SERVE][AUTOSCALERS] Replica scaling sampling period and stability. #4444

Open
JGSweets opened this issue Dec 5, 2024 · 5 comments
Open

Comments

@JGSweets
Copy link
Contributor

JGSweets commented Dec 5, 2024

In autoscalers.py within serve:

elif target_num_replicas > self.target_num_replicas:
self.upscale_counter += 1
self.downscale_counter = 0
if self.upscale_counter >= self.scale_up_consecutive_periods:
self.upscale_counter = 0
self.target_num_replicas = target_num_replicas
elif target_num_replicas < self.target_num_replicas:
self.downscale_counter += 1
self.upscale_counter = 0
if self.downscale_counter >= self.scale_down_consecutive_periods:
self.downscale_counter = 0
self.target_num_replicas = target_num_replicas

When a single qps check is below or above the threshold, the downscale_counter or upscale_counter is set to 0.
This means a single jitter in qps could disrupt scaling.

I propose we allow a sampling over a period to allow scaling to occur based on a percentage of occurrences vs resetting to 0.
This could be set in the scaling policy.


Also, since scaling utilizes math.ceil, it errors on scaling and keeping qps below the value as a max bar vs a target.

target_num_replicas = math.ceil(


Version & Commit info:

@Michaelvll
Copy link
Collaborator

cc'ing @cblmemo

@Michaelvll Michaelvll added the OSS label Dec 19, 2024 — with Linear
@Michaelvll Michaelvll removed the OSS label Dec 19, 2024
@Michaelvll Michaelvll added the OSS label Dec 19, 2024 — with Linear
@Michaelvll Michaelvll removed the OSS label Dec 19, 2024
@cblmemo
Copy link
Collaborator

cblmemo commented Dec 21, 2024

Also, since scaling utilizes math.ceil, it errors on scaling and keeping qps below the value as a max bar vs a target.

Hi @JGSweets ! Thanks for reporting this. I think the first solution makes sense! Could you elaborate on the second one?

@JGSweets
Copy link
Contributor Author

Error may have been a poor choice in wording. I meant that it hedges towards scaling up since it rounds up with any float. This means qps/replica drops significantly if it is barely breaching the requirement. This may be ideal behavior for some.

Maybe this was chosen because if there's any qps, you want to guarantee scaling to 1?

@cblmemo
Copy link
Collaborator

cblmemo commented Dec 21, 2024

Error may have been a poor choice in wording. I meant that it hedges towards scaling up since it rounds up with any float. This means qps/replica drops significantly if it is barely breaching the requirement. This may be ideal behavior for some.

Maybe this was chosen because if there's any qps, you want to guarantee scaling to 1?

That is one reason, but the primary reason is that we dont want any overload on the service.

@JGSweets
Copy link
Contributor Author

That is one reason, but the primary reason is that we dont want any overload on the service.

Would also be convenient if this was a setting where we had a choice. Similar to AWS alarms and triggers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants