Azure service bus: K8 pods become idle after AMQP connection error #36368
Labels
Client
This issue points to a problem in the data-plane of the library.
customer-reported
Issues that are reported by GitHub users external to the Azure organization.
Messaging
Messaging crew
needs-author-feedback
More information is needed from author to address the issue.
no-recent-activity
There has been no recent activity on this issue.
question
The issue doesn't require a change to the product in order to be resolved. Most issues start as that
Service Bus
Describe the bug
The issue we are facing is similar to #28996. Our K8 pods listen to service bus messages continuously, process these messages and sends another message to a service bus queue. Often we see messages like "Connection keep-alive for 'SendClientAsync' failed: AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Can not read frame due to exception: [Errno 104] Connection reset by peer')." These mostly are temporary and the pods get reconnected on their own. But we have seen instances when this does not happen. In that case the pod just becomes idle and does not process any messages, unless the pod is restarted. We have faced many outages because of this behaviour. There are no more logs emitted from the pod again, only this last log message. This is also logged as Log.Info and not thrown as an error from the SDK, hence we are unable to capture these in our try-catch for retry upon our end.
To Reproduce
Steps to reproduce the behavior:
We do not have a full proof way to repro this as it happens randomly, but one way to try this would be to
We do have a sequence of logs that comes during this event.
Expected behavior
Pods should be able to reconnect.
If not, an error should be thrown that can help us handle retries or automatically kill pods.
Chain of events leading to this error:
debugLogs.txt
The text was updated successfully, but these errors were encountered: