-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Open
Milestone
Description
A bug was found in the new pickfirst balancer in gRPC Go which can cause the balancer to get stuck in IDLE
state: grpc/grpc-go#8615.
The implementation of the pickfirst LB in Java is similar, so it may suffer from the same issue. The order of events that lead to the bug is as follows:
- Existing connection breaks, the balancer requests re-resolution and reports IDLE. PF updates the channel state to IDLE with an
Idle picker
. - An RPC is made, triggering the balancer to exit idle through the picker. The balancer attempts to re-connect the failed subchannel.
- The resolver produces a new endpoint list, removing the endpoint used by the existing subchannel. PF removes the existing subchannel. Since the balancer didn't update the channel state to CONNECTING yet, pickfirst thinks that it's still in IDLE and doesn't start connecting to the new endpoints.
- New RPC requests trigger the idle picker, but it's a no-op since it only triggers the balancer's ExitIdle method once.
Metadata
Metadata
Assignees
Labels
No labels