-
Notifications
You must be signed in to change notification settings - Fork 890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backoff to avoid excessive retries to Run Service in a duration #3354
Conversation
@@ -62,7 +62,10 @@ public Task<AgentJobRequestMessage> GetJobMessageAsync(string id, CancellationTo | |||
CheckConnection(); | |||
return RetryRequest<AgentJobRequestMessage>( | |||
async () => await _runServiceHttpClient.GetJobMessageAsync(requestUri, id, VarUtil.OS, cancellationToken), cancellationToken, | |||
shouldRetry: ex => ex is not TaskOrchestrationJobAlreadyAcquiredException); | |||
shouldRetry: ex => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with not retrying these errors, means we immediately call to Broker to get the next message, which might be the same job message.
When one of these errors is encountered from Run Service, an async message is sent to Broker. We don't want to keep hammering the service in a tight loop if there are delays processing the message queue.
maxBackoff: s_maxBackoff, | ||
deltaBackoff: s_backoffCoefficient); | ||
Trace.Warning($"Back off {backoff.TotalSeconds} seconds before next attempt."); | ||
await Task.Delay(backoff, token); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, Task.Delay
will throw on cancellation token fire.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's OK, but I switched to Host.Delay so I could unit test this
fb896c0
to
d73aabc
Compare
d73aabc
to
ec1e338
Compare
Helps avoid excessive calls to Run Service when encountering non-retriable errors from /acquirejob. Normally we rely on the HTTP clients to back off between retry attempts. However, acquiring a job involves calls to both Run Serivce and Broker. And Run Service and Broker communicate with each other in an async fashion.
When Run Service encounters a non-retriable error, it sends an async message to Broker. The runner will, however, immediately call Broker to get the next message. If the async event from Run Service to Broker has not yet been processed, the next message from Broker may be the same job message.
The error throttler helps us back off when encountering successive, non-retriable errors from /acquirejob.