Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Azure ServiceBus memory leak #42717

Open
gataricd opened this issue Oct 30, 2024 · 6 comments
Open

[BUG] Azure ServiceBus memory leak #42717

gataricd opened this issue Oct 30, 2024 · 6 comments
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Bus

Comments

@gataricd
Copy link

Describe the bug
The number of ServiceBusReactorAmqpConnection instances is rising until there is no more memory on the heap. When the app is started the number of instances is 44, because we use 44 queues, but with time it rises to above 500 and eventually we get OutOfMemoryError. The heap size is limited to around 500 MB.
When the OutOfMemoryError happens, all the ServiceBusReactorAmqpConnection have sessionMap set to 0. This lead us to conclusion that garbage collector is not cleaning these connections that are not being used.
We have never had similar problem before and it happens during normal load.

Exception or Stack Trace
The following exceptions might be relevant to the problem:

reactor.core.Exceptions$ErrorCallbackNotImplemented: java.lang.NullPointerException: Cannot invoke "java.util.List.add(Object)" because "this._sessions" is null
Caused by: java.lang.NullPointerException: Cannot invoke "java.util.List.add(Object)" because "this._sessions" is null
	at org.apache.qpid.proton.engine.impl.ConnectionImpl.session(ConnectionImpl.java:91)
	at org.apache.qpid.proton.engine.impl.ConnectionImpl.session(ConnectionImpl.java:39)

and

com.azure.core.amqp.exception.AmqpException: onSessionRemoteClose connectionId[MF_1e9f94_1730303967974], entityName[mdk-mnp-command-queue] condition[Error{condition=amqp:connection:forced, description='The connection was closed by container 'ce030eed2af746d4a84caa602d8b170a_G0' because it did not have any active links in the past 300000 milliseconds. TrackingId:ce030eed2af746d4a84caa602d8b170a_G0, SystemTracker:gateway5, Timestamp:2024-10-30T16:24:05', info=null}], errorContext[NAMESPACE: mdk-westeurope-eu-e-servicebus.servicebus.windows.net. ERROR CONTEXT: N/A, PATH: some-queue]
	at com.azure.core.amqp.implementation.ExceptionUtil.toException(ExceptionUtil.java:90)
	at com.azure.core.amqp.implementation.handler.SessionHandler.onSessionRemoteClose(SessionHandler.java:139)

These exceptions started ocurring when we switched from 7.17.1 to 7.17.3.

To Reproduce
There are no special steps to reproduce.

Code Snippet
The following code is executed for every queue:

val processorClient: ServiceBusProcessorClient = ServiceBusClientBuilder()
            .credential(serviceBusFullyQualifiedName, DefaultAzureCredentialBuilder().build())
            .processor()
            .queueName(queueName)
            .processMessage(handler)
            .processError { context -> processError(context) }
            .maxConcurrentCalls(10)
            .disableAutoComplete()
            .maxAutoLockRenewDuration(Duration.ZERO)
            .prefetchCount(0)
            .buildProcessorClient()
processorClient.start()

Expected behavior
The memory consumption of the service bus library shouldn't raise to the point that cause OutOfMemoryError.

Setup (please complete the following information):
OS: Linux
Library/Libraries: com.azure:azure-messaging-servicebus:7.17.3
Java version: 21
App Server/Environment: AKS
Frameworks: Sprint Boot

@github-actions github-actions bot added Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Bus labels Oct 30, 2024
Copy link

@anuchandy @conniey @lmolkova

Copy link

Thank you for your feedback. Tagging and routing to the team member best able to assist.

@anuchandy
Copy link
Member

anuchandy commented Oct 30, 2024

Hello @gataricd, this is resolved recently, you can find more details here #41865

Note that you will still see the session disconnect/reconnect logs (which is expected) but the new version should address the NullPointerException. Please follow below steps -

Update to 7.17.5 dependency

<dependency>
    <groupId>com.azure</groupId>
    <artifactId>azure-messaging-servicebus</artifactId>
    <version>7.17.5</version>
</dependency>

Update the ServiceBusClientBuilder for "com.azure.core.amqp.cache"

When building any client (ServiceBusProcessorClient, ServiceBusReceiverClient, ServiceBusSenderClient etc..) use the configuration ("com.azure.core.amqp.cache"), as shown below. Make sure this configuration is selected for all the places where the application creates a new ServiceBusClientBuilder -

new ServiceBusClientBuilder()
            .connectionString(queueProperties.connectionString())
                     .configuration(new ConfigurationBuilder()
                         .putProperty("com.azure.core.amqp.cache", "true")
                         .build())
            .processor()
            .queueName(queueName)
            .processMessage(handler)
            .processError { context -> processError(context) }
            .maxConcurrentCalls(10)
            .disableAutoComplete()
            .maxAutoLockRenewDuration(Duration.ZERO)
            .prefetchCount(0)
            .buildProcessorClient()

Choosing this configuration is important to resolve the problem - java.lang.NullPointerException: Cannot invoke "java.util.List.add(Object)" because "this._sessions" is null

Ensure right transitive dependencies

Make sure the transitive dependencies (azure-core-amqp, azure-core) are resolved to expected versions.

mvn dependency:tree
[INFO] ...
[INFO] +- com.azure:azure-messaging-servicebus:jar:7.17.5:compile
[INFO] |  +- com.azure:azure-core:jar:1.53.0:compile
[INFO] |  |  +- ..
[INFO] |  |  \- ...
[INFO] |  \- com.azure:azure-core-amqp:jar:2.9.10:compile
[INFO] |     +- com.microsoft.azure:qpid-proton-j-extensions:jar:1.2.5:compile
[INFO] |     \- org.apache.qpid:proton-j:jar:0.34.1:compile

Note: In the upcoming version the need for opt-in "com.azure.core.amqp.cache" will be removed

@anuchandy anuchandy added the needs-author-feedback Workflow: More information is needed from author to address the issue. label Oct 30, 2024
@github-actions github-actions bot removed the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Oct 30, 2024
@Azure Azure deleted a comment from github-actions bot Oct 30, 2024
@anuchandy anuchandy added the issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. label Oct 30, 2024
@github-actions github-actions bot removed the needs-author-feedback Workflow: More information is needed from author to address the issue. label Oct 30, 2024
Copy link

Hi @gataricd. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.

@gataricd
Copy link
Author

gataricd commented Oct 31, 2024

/unresolve
@anuchandy thanks for the answer, unfortunately the proposed changes didn't resolve the problem. We are still getting:
java.lang.NullPointerException: Cannot invoke "java.util.List.add(Object)" because "this._sessions" is null
after updating to 7.17.5 and setting com.azure.core.amqp.cache to true. The transitive dependencies are right.

@github-actions github-actions bot added needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team and removed issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. labels Oct 31, 2024
@anuchandy
Copy link
Member

Hello @gataricd, thanks for trying it. Could you share -20/+20 minutes of DEBUG logs around this NullPointerException.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Bus
Projects
None yet
Development

No branches or pull requests

2 participants