You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a project to run an AI model using an online endpoint as a backend service, the endpoint is configured (manually set in the portal) to be auto-scale based on the number of requests.
Expected behavior
Expect the endpoint will scale up between 1-2 minutes like other services such as virtual machine scaleset, etc...
Actual behavior
With endpoint, scaling takes a long time, about 12-18 minutes.
Addition information
Do you have suggestions for speeding up the scaling time?
The text was updated successfully, but these errors were encountered:
Operating System
Linux
Version Information
Using the latest version of ML online endpoint
Steps to reproduce
I have a project to run an AI model using an online endpoint as a backend service, the endpoint is configured (manually set in the portal) to be auto-scale based on the number of requests.
Expected behavior
Expect the endpoint will scale up between 1-2 minutes like other services such as virtual machine scaleset, etc...
Actual behavior
With endpoint, scaling takes a long time, about 12-18 minutes.
Addition information
Do you have suggestions for speeding up the scaling time?
The text was updated successfully, but these errors were encountered: