Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ServiceBusConnectionError after running idle #36376

Closed
bosegeorge1212 opened this issue Jul 5, 2024 · 19 comments
Closed

ServiceBusConnectionError after running idle #36376

bosegeorge1212 opened this issue Jul 5, 2024 · 19 comments
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. issue-addressed The Azure SDK team member assisting with this issue believes it to be addressed and ready to close. Messaging Messaging crew question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Bus

Comments

@bosegeorge1212
Copy link

bosegeorge1212 commented Jul 5, 2024

  • azure-servicebus:
  • 7.12.1:
  • Azure Container Apps:
  • 3.12:

Describe the bug
We are getting the following error while receiving messages from Service Bus: azure.servicebus.exceptions.ServiceBusConnectionError: Cannot read frame due to exception: [Errno 104] Connection reset by peer. Error condition: amqp:socket-error.

We observe this behavior when there are no messages in the subscription and you keep the app running.

To Reproduce
There are no exact steps to reproduce this issue. It happens inconsistently. We observe this happening when there are no messages in the subscription, and the app has been running for a long duration.

Expected behavior
The SDK should retry with some delay.

Logs

2024-07-02 17:57:17,088 - azure.servicebus._pyamqp.aio._transport_async - DEBUG - Transport read failed: ConnectionResetError(104, 'Connection reset by peer')
2024-07-02 17:57:17,088 - azure.servicebus._pyamqp.aio._cbs_async - DEBUG - CBS status check: state == <CbsAuthState.OK: 0>, expired == False, refresh required == False
2024-07-02 17:57:17,088 - azure.servicebus.aio._base_handler_async - INFO - AMQP error occurred: (AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Can not read frame due to exception: [Errno 104] Connection reset by peer')), condition: (<ErrorCondition.SocketError: b'amqp:socket-error'>), description: ('Can not read frame due to exception: [Errno 104] Connection reset by peer').
2024-07-02 17:57:17,088 - azure.servicebus._pyamqp.aio._link_async - DEBUG - -> DetachFrame(handle=3, closed=True, error=None)
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._link_async - INFO - An error occurred when detaching the link: AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Can not read frame due to exception: [Errno 104] Connection reset by peer')
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._link_async - INFO - Link state changed: <LinkState.ATTACHED: 3> -> <LinkState.DETACHED: 0>
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._link_async - DEBUG - -> DetachFrame(handle=2, closed=True, error=None)
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._link_async - INFO - An error occurred when detaching the link: AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Can not read frame due to exception: [Errno 104] Connection reset by peer')
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._link_async - INFO - Link state changed: <LinkState.ATTACHED: 3> -> <LinkState.DETACHED: 0>
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._management_link_async - INFO - Management link receiver state changed: <LinkState.ATTACHED: 3> -> <LinkState.DETACHED: 0>
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._link_async - DEBUG - -> DetachFrame(handle=1, closed=True, error=None)
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._link_async - INFO - An error occurred when detaching the link: AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Can not read frame due to exception: [Errno 104] Connection reset by peer')
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._link_async - INFO - Link state changed: <LinkState.ATTACHED: 3> -> <LinkState.DETACHED: 0>
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._management_link_async - INFO - Management link sender state changed: <LinkState.ATTACHED: 3> -> <LinkState.DETACHED: 0>
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._session_async - DEBUG - -> EndFrame(error=None)
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._session_async - INFO - An error occurred when ending the session: AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Can not read frame due to exception: [Errno 104] Connection reset by peer')
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._session_async - INFO - Session state changed: <SessionState.MAPPED: 3> -> <SessionState.UNMAPPED: 0>
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._connection_async - DEBUG - -> CloseFrame(error=None)
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._connection_async - INFO - An error occurred when closing the connection: AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Can not read frame due to exception: [Errno 104] Connection reset by peer')
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._connection_async - INFO - Connection state changed: <ConnectionState.OPENED: 9> -> <ConnectionState.END: 13>
2024-07-02 17:57:17,883 - azure.servicebus.aio._base_handler_async - INFO - 'servicebus.pysdk-22e5f6be' operation has exhausted retry. Last exception: ServiceBusConnectionError('Can not read frame due to exception: [Errno 104] Connection reset by peer Error condition: amqp:socket-error.').

this is the code snippet inside a package

class SubscriptionListener:
    def __init__(self, connection_string: str, topic_name: str, subscription_name: str, max_concurrent_tasks=5):
        self._connection_string = connection_string
        self._topic_name=topic_name
        self._subscription_name = subscription_name
        self._message_handlers: Dict[type, Callable] = {}

    def add_message_handler(self, message_type: type, handler: Callable):
        self._message_handlers[message_type] = handler

    async def receive_messages(self):
        async with ServiceBusClient.from_connection_string(self._connection_string,logging_enable=True) as client:
            async with client.get_subscription_receiver(topic_name=self._topic_name,subscription_name= self._subscription_name) as receiver:
                async for message in receiver:
                    await self._process_and_complete_message(receiver, message)    

and this is code snippet inside our API that invoke the above

service_bus_receiver = SubscriptionListener(conn_str, topic_name,subscription_name)
service_bus_receiver.add_message_handler(ExtractionMessage, extraction_message_handler.handle)
await service_bus_receiver.receive_messages()
@github-actions github-actions bot added Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Bus labels Jul 5, 2024
Copy link

github-actions bot commented Jul 5, 2024

Thank you for your feedback. Tagging and routing to the team member best able to assist.

@kashifkhan
Copy link
Member

Hi @bosegeorge1212 ,

I am seeing this line in your logs which shows up after retries are done.

2024-07-02 17:57:17,883 - azure.servicebus.aio._base_handler_async - INFO - 'servicebus.pysdk-22e5f6be' operation has exhausted retry. Last exception: ServiceBusConnectionError('Can not read frame due to exception: [Errno 104] Connection reset by peer Error condition: amqp:socket-error.').

It seems like the connection is dropping. Are you able to provide us debug logging with frames turned on please?

logging_enable=True is also key here for us to see the exchange of information between client and sdk

import logging
import sys

handler = logging.StreamHandler(stream=sys.stdout)
logger = logging.getLogger('azure.servicebus')
logger.setLevel(logging.DEBUG)
logger.addHandler(handler)

...

from azure.servicebus import ServiceBusClient

client = ServiceBusClient(..., logging_enable=True)

@kashifkhan kashifkhan added the Messaging Messaging crew label Jul 5, 2024
@bosegeorge1212
Copy link
Author

bosegeorge1212 commented Jul 6, 2024

Hi @kashifkhan

i have already added it and the above logs are with that. Thanks in advance

logger.setLevel(logging.DEBUG)

ServiceBusClient.from_connection_string(conn_str=connection_string,  logging_enable=True)

@swathipil
Copy link
Member

swathipil commented Jul 8, 2024

Hi @bosegeorge1212 - It looks like all retries have been exhausted as per:

2024-07-02 17:57:17,883 - azure.servicebus.aio._base_handler_async - INFO - 'servicebus.pysdk-22e5f6be' operation has exhausted retry. Last exception: ServiceBusConnectionError('Can not read frame due to exception: [Errno 104] Connection reset by peer Error condition: amqp:socket-error.').

The last retry raised an exception due to connection drop as @kashifkhan mentioned, which is expected behavior. By default, the operation is retried 3 times. The first couple of retries are not included in the log snippet above. To confirm that all retries failed due to connection dropping, would you be able to provide logs beginning from before this exception started occurring?

If you'd like to modify default retry behavior, you can adjust retry configs, listed [here].

@swathipil swathipil added needs-author-feedback More information is needed from author to address the issue. and removed needs-team-attention This issue needs attention from Azure service team or SDK team labels Jul 8, 2024
Copy link

github-actions bot commented Jul 8, 2024

Hi @bosegeorge1212. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

@bosegeorge1212
Copy link
Author

@swathipil

Is there any way to identify the root cause of the connection loss? It appears to be a recurring pattern where the connection is lost every 10-15 hours of no messages.

We are hosting the application in Azure Container Apps and utilizing a VNET as well.

@github-actions github-actions bot added needs-team-attention This issue needs attention from Azure service team or SDK team and removed needs-author-feedback More information is needed from author to address the issue. labels Jul 8, 2024
@swathipil
Copy link
Member

swathipil commented Jul 9, 2024

@bosegeorge1212 ServiceBusConnectionErrors are caused by transient network issues or service problems. However, this error is not returned by the service, as they would provide more information such as a tracking ID or a message to retry. This is an error raised by the OS to indicate that the connection dropped. Given that it's not able to reconnect on retry, it seems that it was disconnected for a while.

Is there other activity happening on your VNET (heavier than usual) that could be causing issues?

In order to provide more assistance, we would also need the full logs including all retries.

@swathipil swathipil added needs-author-feedback More information is needed from author to address the issue. and removed needs-team-attention This issue needs attention from Azure service team or SDK team labels Jul 9, 2024
Copy link

github-actions bot commented Jul 9, 2024

Hi @bosegeorge1212. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

@bosegeorge1212
Copy link
Author

logs_3_7_2024.xlsx

@github-actions github-actions bot added needs-team-attention This issue needs attention from Azure service team or SDK team and removed needs-author-feedback More information is needed from author to address the issue. labels Jul 11, 2024
@bosegeorge1212
Copy link
Author

@swathipil
I have added the logs. I tried to color the start and end of the error in yellow.

Is there any other activity happening on your VNET (heavier than usual) that could be causing issues? -> Nothing as far as we know.

@swathipil
Copy link
Member

Thanks @bosegeorge1212. We'll investigate and get back to you asap.

@swathipil
Copy link
Member

swathipil commented Jul 20, 2024

Hi @bosegeorge1212 - I'm currently working on trying to reproduce the connection reset by peer error.

In the logs that you provided, has anything been stripped out? More specifically, have any logs been removed between these two lines?

2024-07-02   17:57:17,089 - azure.servicebus._pyamqp.aio._connection_async - INFO -   Connection state changed: <ConnectionState.OPENED: 9> ->   <ConnectionState.END: 13>

and

2024-07-02 17:57:17,883 - azure.servicebus.aio._base_handler_async - INFO - 'servicebus.pysdk-22e5f6be' operation has exhausted retry. Last exception: ServiceBusConnectionError('Can not read frame due to exception: [Errno 104] Connection reset by peer Error condition: amqp:socket-error.').

I would expect to see logs with the message: "Retrying..". Example of expected logs in the comment below.

@swathipil swathipil removed the needs-team-attention This issue needs attention from Azure service team or SDK team label Jul 20, 2024
@swathipil swathipil added the needs-author-feedback More information is needed from author to address the issue. label Jul 20, 2024
Copy link

Hi @bosegeorge1212. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

@swathipil
Copy link
Member

swathipil commented Jul 20, 2024

@bosegeorge1212 - While I was not able to trigger a ConnectionResetError, I was able to trigger a ConnectionRefusedError, which follows the same path and raises the same ServiceBusConnectionError. In the logs, I am able to see that the client retried 3 times, as per the default.

These are the steps I followed:

  1. Run your receive sample and wait for CBS status check: state == <CbsAuthState.OK: 0>, expired == False, refresh required == False in the logs.
  2. Run sudo iptables -A OUTPUT -p tcp --dport 5671 -j REJECT
  3. Disconnect from your network and wait about 30 seconds.
  4. You should see the ConnectionResetError in the logs.

You can check that your rule has been added to the iptable by running:
sudo iptables -L OUTPUT -v -n --line-numbers

And delete a rule with:
sudo iptables -D OUTPUT 1

Logs:

CBS status check: state == <CbsAuthState.OK: 0>, expired == False, refresh required == False
CBS status check: state == <CbsAuthState.OK: 0>, expired == False, refresh required == False
CBS status check: state == <CbsAuthState.OK: 0>, expired == False, refresh required == False
CBS status check: state == <CbsAuthState.OK: 0>, expired == False, refresh required == False
CBS status check: state == <CbsAuthState.OK: 0>, expired == False, refresh required == False
CBS status check: state == <CbsAuthState.OK: 0>, expired == False, refresh required == False
Async transport error: [Errno 111] Connection refused
Transport read failed: ConnectionRefusedError(111, 'Connection refused')
CBS status check: state == <CbsAuthState.OK: 0>, expired == False, refresh required == False
AMQP error occurred: (AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Can not read frame due to exception: [Errno 111] Connection refused')), condition: (<ErrorCondition.SocketError: b'amqp:socket-error'>), description: ('Can not read frame due to exception: [Errno 111] Connection refused').
-> DetachFrame(handle=3, closed=True, error=None)
An error occurred when detaching the link: AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Can not read frame due to exception: [Errno 111] Connection refused')
Link state changed: <LinkState.ATTACHED: 3> -> <LinkState.DETACHED: 0>
-> DetachFrame(handle=2, closed=True, error=None)
An error occurred when detaching the link: AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Can not read frame due to exception: [Errno 111] Connection refused')
Link state changed: <LinkState.ATTACHED: 3> -> <LinkState.DETACHED: 0>
Management link receiver state changed: <LinkState.ATTACHED: 3> -> <LinkState.DETACHED: 0>
-> DetachFrame(handle=1, closed=True, error=None)
An error occurred when detaching the link: AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Can not read frame due to exception: [Errno 111] Connection refused')
Link state changed: <LinkState.ATTACHED: 3> -> <LinkState.DETACHED: 0>
Management link sender state changed: <LinkState.ATTACHED: 3> -> <LinkState.DETACHED: 0>
-> EndFrame(error=None)
An error occurred when ending the session: AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Can not read frame due to exception: [Errno 111] Connection refused')
Session state changed: <SessionState.MAPPED: 3> -> <SessionState.UNMAPPED: 0>
-> CloseFrame(error=None)
An error occurred when closing the connection: AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Can not read frame due to exception: [Errno 111] Connection refused')
Connection state changed: <ConnectionState.OPENED: 9> -> <ConnectionState.END: 13>
'servicebus.pysdk-0315aca4' has an exception (ServiceBusConnectionError('Can not read frame due to exception: [Errno 111] Connection refused Error condition: amqp:socket-error.')). Retrying...
Transport connect failed: gaierror(-3, 'Temporary failure in name resolution')
AMQP error occurred: (AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Failed to initiate the connection due to exception: [Errno -3] Temporary failure in name resolution')), condition: (<ErrorCondition.SocketError: b'amqp:socket-error'>), description: ('Failed to initiate the connection due to exception: [Errno -3] Temporary failure in name resolution').
'servicebus.pysdk-0315aca4' has an exception (ServiceBusConnectionError('Failed to initiate the connection due to exception: [Errno -3] Temporary failure in name resolution Error condition: amqp:socket-error.')). Retrying...
Transport connect failed: gaierror(-3, 'Temporary failure in name resolution')
AMQP error occurred: (AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Failed to initiate the connection due to exception: [Errno -3] Temporary failure in name resolution')), condition: (<ErrorCondition.SocketError: b'amqp:socket-error'>), description: ('Failed to initiate the connection due to exception: [Errno -3] Temporary failure in name resolution').
'servicebus.pysdk-0315aca4' has an exception (ServiceBusConnectionError('Failed to initiate the connection due to exception: [Errno -3] Temporary failure in name resolution Error condition: amqp:socket-error.')). Retrying...
Transport connect failed: gaierror(-3, 'Temporary failure in name resolution')
AMQP error occurred: (AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Failed to initiate the connection due to exception: [Errno -3] Temporary failure in name resolution')), condition: (<ErrorCondition.SocketError: b'amqp:socket-error'>), description: ('Failed to initiate the connection due to exception: [Errno -3] Temporary failure in name resolution').
'servicebus.pysdk-0315aca4' operation has exhausted retry. Last exception: ServiceBusConnectionError('Failed to initiate the connection due to exception: [Errno -3] Temporary failure in name resolution Error condition: amqp:socket-error.').

@github-actions github-actions bot added needs-team-attention This issue needs attention from Azure service team or SDK team and removed needs-author-feedback More information is needed from author to address the issue. labels Jul 24, 2024
@bosegeorge1212
Copy link
Author

bosegeorge1212 commented Jul 24, 2024

@bosegeorge1212 - While I was not able to trigger a ConnectionResetError, I was able to trigger a ConnectionRefusedError, which follows the same path and raises the same ServiceBusConnectionError. In the logs, I am able to see that the client retried 3 times, as per the default.

These are the steps I followed:

  1. Run your receive sample and wait for CBS status check: state == <CbsAuthState.OK: 0>, expired == False, refresh required == False in the logs.
  2. Run sudo iptables -A OUTPUT -p tcp --dport 5671 -j REJECT
  3. Disconnect from your network and wait about 30 seconds.
  4. You should see the ConnectionResetError in the logs.

You can check that your rule has been added to the iptable by running: sudo iptables -L OUTPUT -v -n --line-numbers

And delete a rule with: sudo iptables -D OUTPUT 1

Logs:

CBS status check: state == <CbsAuthState.OK: 0>, expired == False, refresh required == False
CBS status check: state == <CbsAuthState.OK: 0>, expired == False, refresh required == False
CBS status check: state == <CbsAuthState.OK: 0>, expired == False, refresh required == False
CBS status check: state == <CbsAuthState.OK: 0>, expired == False, refresh required == False
CBS status check: state == <CbsAuthState.OK: 0>, expired == False, refresh required == False
CBS status check: state == <CbsAuthState.OK: 0>, expired == False, refresh required == False
Async transport error: [Errno 111] Connection refused
Transport read failed: ConnectionRefusedError(111, 'Connection refused')
CBS status check: state == <CbsAuthState.OK: 0>, expired == False, refresh required == False
AMQP error occurred: (AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Can not read frame due to exception: [Errno 111] Connection refused')), condition: (<ErrorCondition.SocketError: b'amqp:socket-error'>), description: ('Can not read frame due to exception: [Errno 111] Connection refused').
-> DetachFrame(handle=3, closed=True, error=None)
An error occurred when detaching the link: AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Can not read frame due to exception: [Errno 111] Connection refused')
Link state changed: <LinkState.ATTACHED: 3> -> <LinkState.DETACHED: 0>
-> DetachFrame(handle=2, closed=True, error=None)
An error occurred when detaching the link: AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Can not read frame due to exception: [Errno 111] Connection refused')
Link state changed: <LinkState.ATTACHED: 3> -> <LinkState.DETACHED: 0>
Management link receiver state changed: <LinkState.ATTACHED: 3> -> <LinkState.DETACHED: 0>
-> DetachFrame(handle=1, closed=True, error=None)
An error occurred when detaching the link: AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Can not read frame due to exception: [Errno 111] Connection refused')
Link state changed: <LinkState.ATTACHED: 3> -> <LinkState.DETACHED: 0>
Management link sender state changed: <LinkState.ATTACHED: 3> -> <LinkState.DETACHED: 0>
-> EndFrame(error=None)
An error occurred when ending the session: AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Can not read frame due to exception: [Errno 111] Connection refused')
Session state changed: <SessionState.MAPPED: 3> -> <SessionState.UNMAPPED: 0>
-> CloseFrame(error=None)
An error occurred when closing the connection: AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Can not read frame due to exception: [Errno 111] Connection refused')
Connection state changed: <ConnectionState.OPENED: 9> -> <ConnectionState.END: 13>
'servicebus.pysdk-0315aca4' has an exception (ServiceBusConnectionError('Can not read frame due to exception: [Errno 111] Connection refused Error condition: amqp:socket-error.')). Retrying...
Transport connect failed: gaierror(-3, 'Temporary failure in name resolution')
AMQP error occurred: (AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Failed to initiate the connection due to exception: [Errno -3] Temporary failure in name resolution')), condition: (<ErrorCondition.SocketError: b'amqp:socket-error'>), description: ('Failed to initiate the connection due to exception: [Errno -3] Temporary failure in name resolution').
'servicebus.pysdk-0315aca4' has an exception (ServiceBusConnectionError('Failed to initiate the connection due to exception: [Errno -3] Temporary failure in name resolution Error condition: amqp:socket-error.')). Retrying...
Transport connect failed: gaierror(-3, 'Temporary failure in name resolution')
AMQP error occurred: (AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Failed to initiate the connection due to exception: [Errno -3] Temporary failure in name resolution')), condition: (<ErrorCondition.SocketError: b'amqp:socket-error'>), description: ('Failed to initiate the connection due to exception: [Errno -3] Temporary failure in name resolution').
'servicebus.pysdk-0315aca4' has an exception (ServiceBusConnectionError('Failed to initiate the connection due to exception: [Errno -3] Temporary failure in name resolution Error condition: amqp:socket-error.')). Retrying...
Transport connect failed: gaierror(-3, 'Temporary failure in name resolution')
AMQP error occurred: (AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Failed to initiate the connection due to exception: [Errno -3] Temporary failure in name resolution')), condition: (<ErrorCondition.SocketError: b'amqp:socket-error'>), description: ('Failed to initiate the connection due to exception: [Errno -3] Temporary failure in name resolution').
'servicebus.pysdk-0315aca4' operation has exhausted retry. Last exception: ServiceBusConnectionError('Failed to initiate the connection due to exception: [Errno -3] Temporary failure in name resolution Error condition: amqp:socket-error.').

We are currently investigating an issue where our application, deployed in Azure Container Apps, experiences intermittent connectivity problems with the Azure Service Bus after 10-20 hours of inactivity. Despite having implemented retry logic, we are keen to identify the root cause of this issue.

While we have simulated network issues such as ConnectionRefusedError and ConnectionResetError to test our retry logic, we are now focused on understanding why these errors occur after prolonged periods of inactivity. Given that we do not have access to the underlying VM to perform more granular network testing, we are exploring other methods to pinpoint the cause.

Could you please suggest any approaches or tools that could help us diagnose and resolve this issue effectively?

@bosegeorge1212
Copy link
Author

Hi @bosegeorge1212 - I'm currently working on trying to reproduce the connection reset by peer error.

In the logs that you provided, has anything been stripped out? More specifically, have any logs been removed between these two lines?

2024-07-02   17:57:17,089 - azure.servicebus._pyamqp.aio._connection_async - INFO -   Connection state changed: <ConnectionState.OPENED: 9> ->   <ConnectionState.END: 13>

and

2024-07-02 17:57:17,883 - azure.servicebus.aio._base_handler_async - INFO - 'servicebus.pysdk-22e5f6be' operation has exhausted retry. Last exception: ServiceBusConnectionError('Can not read frame due to exception: [Errno 104] Connection reset by peer Error condition: amqp:socket-error.').

I would expect to see logs with the message: "Retrying..". Example of expected logs in the comment below.

nothing is missed. adding once more

2024-07-02 17:57:17,088 - azure.servicebus._pyamqp.aio._transport_async - DEBUG - Transport read failed: ConnectionResetError(104, 'Connection reset by peer')
2024-07-02 17:57:17,088 - azure.servicebus._pyamqp.aio._cbs_async - DEBUG - CBS status check: state == <CbsAuthState.OK: 0>, expired == False, refresh required == False
2024-07-02 17:57:17,088 - azure.servicebus.aio._base_handler_async - INFO - AMQP error occurred: (AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Can not read frame due to exception: [Errno 104] Connection reset by peer')), condition: (<ErrorCondition.SocketError: b'amqp:socket-error'>), description: ('Can not read frame due to exception: [Errno 104] Connection reset by peer').
2024-07-02 17:57:17,088 - azure.servicebus._pyamqp.aio._link_async - DEBUG - -> DetachFrame(handle=3, closed=True, error=None)
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._link_async - INFO - An error occurred when detaching the link: AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Can not read frame due to exception: [Errno 104] Connection reset by peer')
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._link_async - INFO - Link state changed: <LinkState.ATTACHED: 3> -> <LinkState.DETACHED: 0>
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._link_async - DEBUG - -> DetachFrame(handle=2, closed=True, error=None)
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._link_async - INFO - An error occurred when detaching the link: AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Can not read frame due to exception: [Errno 104] Connection reset by peer')
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._link_async - INFO - Link state changed: <LinkState.ATTACHED: 3> -> <LinkState.DETACHED: 0>
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._management_link_async - INFO - Management link receiver state changed: <LinkState.ATTACHED: 3> -> <LinkState.DETACHED: 0>
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._link_async - DEBUG - -> DetachFrame(handle=1, closed=True, error=None)
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._link_async - INFO - An error occurred when detaching the link: AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Can not read frame due to exception: [Errno 104] Connection reset by peer')
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._link_async - INFO - Link state changed: <LinkState.ATTACHED: 3> -> <LinkState.DETACHED: 0>
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._management_link_async - INFO - Management link sender state changed: <LinkState.ATTACHED: 3> -> <LinkState.DETACHED: 0>
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._session_async - DEBUG - -> EndFrame(error=None)
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._session_async - INFO - An error occurred when ending the session: AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Can not read frame due to exception: [Errno 104] Connection reset by peer')
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._session_async - INFO - Session state changed: <SessionState.MAPPED: 3> -> <SessionState.UNMAPPED: 0>
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._connection_async - DEBUG - -> CloseFrame(error=None)
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._connection_async - INFO - An error occurred when closing the connection: AMQPConnectionError('Error condition: ErrorCondition.SocketError\n Error Description: Can not read frame due to exception: [Errno 104] Connection reset by peer')
2024-07-02 17:57:17,089 - azure.servicebus._pyamqp.aio._connection_async - INFO - Connection state changed: <ConnectionState.OPENED: 9> -> <ConnectionState.END: 13>
2024-07-02 17:57:17,883 - azure.servicebus.aio._base_handler_async - INFO - 'servicebus.pysdk-22e5f6be' operation has exhausted retry. Last exception: ServiceBusConnectionError('Can not read frame due to exception: [Errno 104] Connection reset by peer Error condition: amqp:socket-error.').

@swathipil
Copy link
Member

Hi @bosegeorge1212 - The logs show that the SDK raises the error when the OS error occurs. Based on the info you've provided and the repro that I added here, the retry logic in the SDK is working as expected.

We are currently investigating an issue where our application, deployed in Azure Container Apps, experiences intermittent connectivity problems with the Azure Service Bus after 10-20 hours of inactivity.

FYI: The connection to the service is not meant to stay open for 10-20 hours of inactivity. In general, the service has a [10 minute idle timeout].

Since these connection errors are not from the Service Bus service, we would suggest following the [Container Apps troubleshooting guide] or opening an issue with Container Apps.

@swathipil swathipil added issue-addressed The Azure SDK team member assisting with this issue believes it to be addressed and ready to close. and removed needs-team-attention This issue needs attention from Azure service team or SDK team labels Jul 30, 2024
Copy link

Hi @bosegeorge1212. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.

Copy link

github-actions bot commented Aug 6, 2024

Hi @bosegeorge1212, since you haven’t asked that we /unresolve the issue, we’ll close this out. If you believe further discussion is needed, please add a comment /unresolve to reopen the issue.

@github-actions github-actions bot closed this as completed Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. issue-addressed The Azure SDK team member assisting with this issue believes it to be addressed and ready to close. Messaging Messaging crew question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Bus
Projects
None yet
Development

No branches or pull requests

4 participants