-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[feat] PIP-442: Add memory limits for CommandGetTopicsOfNamespace/CommandWatchTopicList #24833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…fNamespace and CommandWatchTopicList
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #24833 +/- ##
=============================================
+ Coverage 38.56% 74.31% +35.75%
- Complexity 13262 33995 +20733
=============================================
Files 1856 1920 +64
Lines 145287 150039 +4752
Branches 16877 17399 +522
=============================================
+ Hits 56025 111502 +55477
+ Misses 81696 29628 -52068
- Partials 7566 8909 +1343
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements PIP-442 to add memory limits for topic listing operations (CommandGetTopicsOfNamespace and CommandWatchTopicList). It introduces an asynchronous dual memory limiter that tracks separate limits for heap memory (topic list assembly) and direct memory (network buffers) to prevent broker/proxy memory exhaustion when handling large namespaces with thousands of topics.
- Introduces
AsyncSemaphoreandAsyncDualMemoryLimiterinterfaces with comprehensive implementations - Integrates memory limiting into broker and proxy topic listing operations with cancellation support
- Adds 6 new configuration parameters for heap/direct memory limits, timeouts, and queue sizes
- Exposes 26 new metrics (13 Prometheus + 13 OpenTelemetry) for monitoring memory usage and queue behavior
Reviewed Changes
Copilot reviewed 40 out of 40 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| pulsar-common/src/main/java/org/apache/pulsar/common/semaphore/ | Core semaphore and dual memory limiter implementations with utility methods |
| pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ | Broker-side integration for topic listing memory control |
| pulsar-proxy/src/main/java/org/apache/pulsar/proxy/server/ | Proxy-side integration for topic listing memory control |
| pulsar-broker-common/src/main/java/org/apache/pulsar/broker/topiclistlimit/ | TopicListMemoryLimiter with Prometheus and OpenTelemetry metrics |
| conf/*.conf | Configuration files with new memory limiting parameters |
| Test files | Comprehensive unit and integration tests for the new functionality |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
...ker-common/src/main/java/org/apache/pulsar/broker/topiclistlimit/TopicListMemoryLimiter.java
Outdated
Show resolved
Hide resolved
...-common/src/test/java/org/apache/pulsar/common/semaphore/AsyncDualMemoryLimiterUtilTest.java
Outdated
Show resolved
Hide resolved
...-common/src/test/java/org/apache/pulsar/common/semaphore/AsyncDualMemoryLimiterImplTest.java
Outdated
Show resolved
Hide resolved
pulsar-common/src/main/java/org/apache/pulsar/common/semaphore/AsyncDualMemoryLimiterUtil.java
Outdated
Show resolved
Hide resolved
|
I noticed a separate heap memory issue related to topic listing operations: the childrenCache in AbstractMetadataStore. There's a separate PR to address the unbounded cache: #24868. |
...r-common/src/main/java/org/apache/pulsar/broker/topiclistlimit/TopicListSizeResultCache.java
Show resolved
Hide resolved
...r-common/src/main/java/org/apache/pulsar/broker/topiclistlimit/TopicListSizeResultCache.java
Show resolved
Hide resolved
...r-common/src/main/java/org/apache/pulsar/broker/topiclistlimit/TopicListSizeResultCache.java
Show resolved
Hide resolved
pulsar-common/src/main/java/org/apache/pulsar/common/semaphore/AsyncSemaphoreImpl.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/impl/NamespacesBase.java
Show resolved
Hide resolved
| private final Backoff retryBackoff; | ||
|
|
||
|
|
||
| public TopicListService(PulsarService pulsar, ServerCnx connection, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is a notification for the client, to let the client know which topic was changed; it should not be limited by the new feature. the change https://github.com/apache/pulsar/pull/24833/files#diff-5cb94e51fda5445d6fc829fda24eeccda65bf166643a7bf6957fa195516bca28L363-R392 is also related
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is a notification for the client, to let the client know which topic was changed; it should not be limited by the new feature.
I disagree. To properly limit memory usage, it's also necessary to handle CommandWatchTopicListSuccess / CommandWatchTopicUpdate as explained in https://github.com/apache/pulsar/blob/master/pip/pip-442.md .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't close this comment for now. I will review this piece of code again. I will solve it as I have completed the review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lhotari The command should not be limited, because MultiTopicsConsumer and MultiTopicsProducer need to receive this command once the topics' partition is changed. The regexp consumer is also relies on the command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lhotari The command should not be limited, because
MultiTopicsConsumerandMultiTopicsProducerneed to receive this command once the topics' partition is changed. The regexp consumer is also relies on the command.
@poorbarcode There's perhaps some misunderstanding. Memory limiting won't prevent this command from being executed unless it times out or the queue fills up. It is absolutely necessary to handle also CommandWatchTopicListSuccess / CommandWatchTopicUpdate related commands so that the broker doesn't run out of memory and that's also explained in PIP-442. Handling CommandWatchTopicListSuccess / CommandWatchTopicUpdate related commands with the memory limiter will help the broker remain stable regardless of a large amount of regex consumers connecting at once. When the limit is reached, it will queue up the requests until other requests have completed.
Does this make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unless it times out or the queue fills up
That situation is what I want to say. It will cause consumers to be unable to receive messages that were sent to the new partitions until the client application restarts.
...ker-common/src/main/java/org/apache/pulsar/broker/topiclistlimit/TopicListMemoryLimiter.java
Show resolved
Hide resolved
|
@codelipenghui @poorbarcode I've addressed the review comments. PTAL |
Review comments were addressed earlier.
PIP-442
Motivation
See PIP-442.
Apache Pulsar brokers currently lack memory limits for topic listing operations (
CommandGetTopicsOfNamespaceandCommandWatchTopicList), creating a significant gap in the broker's otherwise comprehensive memory management framework. This can lead to:While Pulsar has robust memory limits for message publishing (
maxMessagePublishBufferSizeInMB), managed ledger operations (managedLedgerMaxReadsInFlightSizeInMB,managedLedgerCacheSizeMB), and concurrent requests (maxConcurrentLookupRequest), topic listing operations remain unbounded.Modifications
This PR implements PIP-442 by introducing memory-aware flow control for topic listing operations:
Core Components
AsyncSemaphore Interface & Implementation
AsyncDualMemoryLimiter Interface & Implementation
AsyncDualMemoryLimiterUtil
TopicListMemoryLimiter
Integration Points
Broker Integration:
ServerCnx.handleGetTopicsOfNamespace()to acquire initial heap permits (1KB estimate), update to actual size after topic retrieval, then acquire direct memory permits for response serializationPulsarCommandSenderImplto acquire direct memory permits before serialization and release after write completionTopicListServicefor watch command memory controlTopicListMemoryLimiterinBrokerServicewith configured limitsProxy Integration:
LookupProxyHandler.handleGetTopicsOfNamespace()with similar permit acquisition flowTopicListMemoryLimiterinProxyServicewith configured limitsConfiguration
Added six new configuration parameters to
broker.conf,proxy.conf, andstandalone.conf:Metrics
Added 12 new metrics per instance (broker/proxy) with both Prometheus and OpenTelemetry support:
Verifying this change
This change added tests and can be verified as follows:
Added
PatternConsumerBackPressureMultipleConsumersTest: Integration test that creates 8,192 partitioned topics and sends 500 concurrentgetTopicsOfNamespacerequests from 200 different client connections. The test intentionally reduces available direct memory to reproduce memory pressure scenarios and verifies all requests complete successfully with memory limiting in place.Added
ProxyPatternConsumerBackPressureMultipleConsumersTest: Extended version of the broker test that routes requests through the Pulsar Proxy to verify memory limiting works correctly in proxy scenarios.Added
AsyncSemaphoreImplTest: Comprehensive unit tests for the async semaphore implementation covering:Added
AsyncDualMemoryLimiterImplTest: Unit tests for the dual memory limiter covering:Added
AsyncDualMemoryLimiterUtilTest: Tests for utility methods covering:Verified metrics: Tests validate that all 26 metrics are properly registered and report correct values.
Documentation
docdoc-requireddoc-not-neededdoc-complete