-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky Test: RingHash_SwitchToLowerPriorityAndThenBack #7783
Comments
Actual failure is in Test/ServerSideXDS_FileWatcherCerts. |
The child balancer is orphaned due to a race in the priority balancer which results in grpc-go/xds/internal/balancer/priority/balancer_priority.go Lines 122 to 133 in ae2a04f
Inside grpc-go/internal/balancergroup/balancergroup.go Lines 371 to 415 in ae2a04f
At the same time, the ClientConn starts closing, resulting in balancer tree closing. When the priority balancer begins shutdown, it closes the grpc-go/xds/internal/balancer/priority/balancer.go Lines 221 to 226 in ae2a04f
This results in two concurrent calls into balancergroup: One to grpc-go/internal/balancergroup/balancergroup.go Lines 561 to 569 in ae2a04f
When the cache is cleared before priority-0-1 is added, a balancer is leaked.
|
We haven't see this so far on GitHub Actions, but it seems like this might be a bug in the code rather than in the test.
Full test log here: https://pastebin.com/u2v2JshT
The problem seems to be as follows:
The test passes. But there is a leaked goroutine. Basically, the child of priority1, which is outlier detection is not closed. And the child of outlier detection, which is clusterimpl is not closed either.
I believe the problem arises because when priority1 is closed, it is moved to the idle cache in the balancergroup, but when the priority LB is closed soon after, for some reason, the child in the idle cache is not being cleaned up.
This failure happens about 2 times out of 100k, but I feel it is worth investigating.
The text was updated successfully, but these errors were encountered: