Azure service bus: K8 pods become idle after AMQP connection error #36368

juhitiwari · 2024-07-05T05:05:29Z

Package Name: azure-servicebus
Package Version: 7.11.1
Operating System: Debian Container (kubernetes pod)
Python Version: 3.9.19

Describe the bug
The issue we are facing is similar to #28996. Our K8 pods listen to service bus messages continuously, process these messages and sends another message to a service bus queue. Often we see messages like "Connection keep-alive for 'SendClientAsync' failed: AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Can not read frame due to exception: [Errno 104] Connection reset by peer')." These mostly are temporary and the pods get reconnected on their own. But we have seen instances when this does not happen. In that case the pod just becomes idle and does not process any messages, unless the pod is restarted. We have faced many outages because of this behaviour. There are no more logs emitted from the pod again, only this last log message. This is also logged as Log.Info and not thrown as an error from the SDK, hence we are unable to capture these in our try-catch for retry upon our end.

To Reproduce
Steps to reproduce the behavior:
We do not have a full proof way to repro this as it happens randomly, but one way to try this would be to

Build a service bus consumer and publisher (SendClientAsync seems to be in the publisher module, hence we are thinking this might be coming from here)
Keep the channel idle for 2-3 hours
Check for last log message as "Connection keep-alive for 'SendClientAsync' failed: AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Can not read frame due to exception: [Errno 104] Connection reset by peer')."
Put a message in service bus, the message will not be consumed.

We do have a sequence of logs that comes during this event.

Expected behavior
Pods should be able to reconnect.
If not, an error should be thrown that can help us handle retries or automatically kill pods.

Chain of events leading to this error:
debugLogs.txt

github-actions · 2024-07-05T05:06:08Z

Thank you for your feedback. Tagging and routing to the team member best able to assist.

kashifkhan · 2024-07-05T20:35:56Z

@juhitiwari would you also be able to provide some frame logs when this happens please. The current snippet of logs is not enough

logging_enable=True is key here :)

import logging
import sys

handler = logging.StreamHandler(stream=sys.stdout)
logger = logging.getLogger('azure.servicebus')
logger.setLevel(logging.DEBUG)
logger.addHandler(handler)

...

from azure.servicebus import ServiceBusClient

client = ServiceBusClient(..., logging_enable=True)

juhitiwari · 2024-07-08T04:32:42Z

I had enabled logging but we are not able to see them because the logging level is set to info. Will debug be required to see these ones?

juhitiwari · 2024-07-08T11:12:09Z

I have enabled debug logs as well. Since, we can't repro this consistently on our end, I will post the new logs here once we face this issue again.

github-actions · 2024-07-09T17:55:52Z

Hi @juhitiwari. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

github-actions · 2024-07-16T21:34:14Z

Hi @juhitiwari, we're sending this friendly reminder because we haven't heard back from you in 7 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

github-actions bot assigned kashifkhan Jul 5, 2024

kashifkhan assigned swathipil and l0lawrence Jul 5, 2024

kashifkhan added the Messaging Messaging crew label Jul 5, 2024

swathipil added needs-author-feedback More information is needed from author to address the issue. and removed needs-team-attention This issue needs attention from Azure service team or SDK team labels Jul 9, 2024

github-actions bot added the no-recent-activity There has been no recent activity on this issue. label Jul 16, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Azure service bus: K8 pods become idle after AMQP connection error #36368

Azure service bus: K8 pods become idle after AMQP connection error #36368

juhitiwari commented Jul 5, 2024

github-actions bot commented Jul 5, 2024

kashifkhan commented Jul 5, 2024

juhitiwari commented Jul 8, 2024

juhitiwari commented Jul 8, 2024

github-actions bot commented Jul 9, 2024

github-actions bot commented Jul 16, 2024

Azure service bus: K8 pods become idle after AMQP connection error #36368

Azure service bus: K8 pods become idle after AMQP connection error #36368

Comments

juhitiwari commented Jul 5, 2024

github-actions bot commented Jul 5, 2024

kashifkhan commented Jul 5, 2024

juhitiwari commented Jul 8, 2024

juhitiwari commented Jul 8, 2024

github-actions bot commented Jul 9, 2024

github-actions bot commented Jul 16, 2024