[fix][broker] Fix repeatedly acquired pending reads quota #23869

poorbarcode · 2025-01-20T14:19:15Z

Motivation

Background

Broker will limit the pending reading to Bookies if enabled managedLedgerMaxReadsInFlightSize
[managed-ledger] Do not send duplicate reads to BK/offloaders #17241 introduced a mechanism that merges BK reading requests if we can, for example:
- Request-1: Read entry 50~59
- Request-2: Read entry 50~69
- The Request-2 will wait for Request-1, and only send a real request that reads 60~69 after Request-1 is finished.

Issue

The Request-2 above requests repeatedly
- It requests 50~70 when creating.
- It requests another range 61~70 after Request-1 is finished.
Expected: Request-2 acquire 20 quota.
Exact: Request-2 acquire 40 quota.
You can call testPreciseLimitation to reproduce the issue

Modifications

Fix the issue.

Documentation

doc
doc-required
doc-not-needed
doc-complete

Matching PR in forked repository

PR in forked repository: x

...er/src/test/java/org/apache/bookkeeper/mledger/impl/InflightReadsLimiterIntegrationTest.java

lhotari · 2025-01-23T08:13:43Z

I agree that this issue exists, however there are broader issues in the managedLedgerMaxReadsInFlightSizeInMB solution.

Sharing some context about the issues I have found and what I have currently in progress:

I have reported issues #23482, #23504, #23505 and #23506 .
These issues are related to the broker cache and managedLedgerMaxReadsInFlightSizeInMB and dispatcherMaxReadSizeBytes limits.

I have already changes to address these issues in an experimental branch, pending the submission of individual PRs. Addressing the lack of caching for replay queue messages requires broader changes to the the broker cache and those will be covered with a new PIP.

Some earlier details shared in this comment: #23524 (comment) .

Due to issues #23482 and #23506, I don't think that this PR would resolve problems alone. Regarding broker OOMEs, issue #23504 would also need to be resolved. It's possible that dispatcherDispatchMessagesInSubscriptionThread=false and managedLedgerReadEntryTimeoutSeconds=15 (some reasonable value) could be a mitigation to some issues.

poorbarcode · 2025-01-23T08:29:59Z

@lhotari

I have already changes to address these issues in an experimental branch, pending the submission of individual PRs. Addressing the lack of caching for replay queue messages requires broader changes to the the broker cache and those will be covered with a new PIP.

Some earlier details shared in this comment: #23524 (comment) .

Due to issues #23482 and #23506, I don't think that this PR would resolve problems alone. Regarding broker OOMEs, issue #23504 would also need to be resolved. It's possible that dispatcherDispatchMessagesInSubscriptionThread=false and managedLedgerReadEntryTimeoutSeconds=15 (some reasonable value) could be a mitigation to some issues.

Agree with you, let us fix them one by one, which is easier to review, and we should add tests for each case.

heesung-sn · 2025-01-24T01:25:02Z

I think we should use AsyncTokenBucket or Guava.RateLimiter.tryAcquire for this rate limiter.

IMHO, the current logic, requiring acquiredPermit to release is error-prone, when the caller forgets to release it. Instead, I think we better use token bucket-based one, which can automatically fill the bucket.

lhotari · 2025-01-24T22:12:16Z

I think we should use AsyncTokenBucket or Guava.RateLimiter.tryAcquire for this rate limiter.

IMHO, the current logic, requiring acquiredPermit to release is error-prone, when the caller forgets to release it. Instead, I think we better use token bucket-based one, which can automatically fill the bucket.

@heesung-sn I agree, the current solution is problematic. However, a token bucket isn't most optimal for this use case. I have a redesigned solution the managedLedgerMaxReadsInFlightSize solution in my experimental branch (example). There are a lot of different changes in branch so it might be hard to see what it's about. I'll extract that into a clear PR when time comes.
There's first #23892 and #23894 which need reviews so that I could make further progress. Do you have a chance to review them?

lhotari

LGTM

codecov-commenter · 2025-01-27T19:25:39Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.24%. Comparing base (bbc6224) to head (977b33a).
Report is 871 commits behind head on master.

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #23869      +/-   ##
============================================
+ Coverage     73.57%   74.24%   +0.66%     
+ Complexity    32624    32222     -402     
============================================
  Files          1877     1853      -24     
  Lines        139502   143618    +4116     
  Branches      15299    16310    +1011     
============================================
+ Hits         102638   106622    +3984     
+ Misses        28908    28610     -298     
- Partials       7956     8386     +430

Flag	Coverage Δ
inttests	`26.78% <60.00%> (+2.19%)`	⬆️
systests	`23.16% <60.00%> (-1.16%)`	⬇️
unittests	`73.75% <100.00%> (+0.90%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...keeper/mledger/impl/cache/PendingReadsManager.java	`86.80% <ø> (+0.14%)`	⬆️
...keeper/mledger/impl/cache/RangeEntryCacheImpl.java	`63.48% <100.00%> (+4.73%)`	⬆️

... and 1025 files with indirect coverage changes

lhotari · 2025-01-27T20:01:10Z

I think we should use AsyncTokenBucket or Guava.RateLimiter.tryAcquire for this rate limiter.
IMHO, the current logic, requiring acquiredPermit to release is error-prone, when the caller forgets to release it. Instead, I think we better use token bucket-based one, which can automatically fill the bucket.

@heesung-sn I agree, the current solution is problematic. However, a token bucket isn't most optimal for this use case. I have a redesigned solution the managedLedgerMaxReadsInFlightSize solution in my experimental branch (example). There are a lot of different changes in branch so it might be hard to see what it's about. I'll extract that into a clear PR when time comes. There's first #23892 and #23894 which need reviews so that I could make further progress. Do you have a chance to review them?

@heesung-sn @poorbarcode To continue the improvements to address issues with managedLedgerMaxReadsInFlightSize, I have created #23901 which addresses the problems I have referred to earlier. Please review

(cherry picked from commit 331a997)

(cherry picked from commit 331a997) (cherry picked from commit 0132d93)

[fix][broker]Fix repeated acquire pending reads quota

977b33a

poorbarcode added release/3.0.10 release/3.3.5 release/4.0.3 labels Jan 20, 2025

poorbarcode added this to the 4.1.0 milestone Jan 20, 2025

poorbarcode requested review from lhotari, rdhabalia, eolivelli, codelipenghui and gaoran10 January 20, 2025 14:19

poorbarcode self-assigned this Jan 20, 2025

poorbarcode changed the title ~~[fix][broker]Fix repeated acquire pending reads quota~~ [fix][broker]Fix repeatedly acquire pending reads quota Jan 20, 2025

github-actions bot added the doc-not-needed Your PR changes do not impact docs label Jan 20, 2025

lhotari reviewed Jan 23, 2025

View reviewed changes

...er/src/test/java/org/apache/bookkeeper/mledger/impl/InflightReadsLimiterIntegrationTest.java Show resolved Hide resolved

lhotari mentioned this pull request Jan 23, 2025

[Enhancement] Improve Pulsar Broker cache defaults to get better out-of-the-box performance #23466

Open

2 tasks

poorbarcode requested a review from lhotari January 23, 2025 08:30

lhotari approved these changes Jan 27, 2025

View reviewed changes

lhotari changed the title ~~[fix][broker]Fix repeatedly acquire pending reads quota~~ [fix][broker] Fix repeatedly acquire pending reads quota Jan 27, 2025

lhotari changed the title ~~[fix][broker] Fix repeatedly acquire pending reads quota~~ [fix][broker] Fix repeatedly acquired pending reads quota Jan 27, 2025

lhotari merged commit 331a997 into apache:master Jan 27, 2025
64 of 67 checks passed

lhotari pushed a commit that referenced this pull request Jan 28, 2025

[fix][broker] Fix repeatedly acquired pending reads quota (#23869)

a22b758

(cherry picked from commit 331a997)

lhotari added the cherry-picked/branch-4.0 label Jan 28, 2025

lhotari pushed a commit that referenced this pull request Jan 29, 2025

[fix][broker] Fix repeatedly acquired pending reads quota (#23869)

0132d93

(cherry picked from commit 331a997)

lhotari pushed a commit that referenced this pull request Jan 29, 2025

[fix][broker] Fix repeatedly acquired pending reads quota (#23869)

e778e8a

(cherry picked from commit 331a997)

lhotari added cherry-picked/branch-3.3 cherry-picked/branch-3.0 labels Jan 29, 2025

nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Jan 31, 2025

[fix][broker] Fix repeatedly acquired pending reads quota (apache#23869)

d48c0a6

(cherry picked from commit 331a997) (cherry picked from commit 0132d93)

srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Feb 3, 2025

[fix][broker] Fix repeatedly acquired pending reads quota (apache#23869)

4549802

(cherry picked from commit 331a997) (cherry picked from commit 0132d93)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix][broker] Fix repeatedly acquired pending reads quota #23869

[fix][broker] Fix repeatedly acquired pending reads quota #23869

poorbarcode commented Jan 20, 2025

lhotari commented Jan 23, 2025 •

edited

Loading

poorbarcode commented Jan 23, 2025

heesung-sn commented Jan 24, 2025

lhotari commented Jan 24, 2025 •

edited

Loading

lhotari left a comment

codecov-commenter commented Jan 27, 2025 •

edited

Loading

lhotari commented Jan 27, 2025

[fix][broker] Fix repeatedly acquired pending reads quota #23869

[fix][broker] Fix repeatedly acquired pending reads quota #23869

Conversation

poorbarcode commented Jan 20, 2025

Motivation

Modifications

Documentation

Matching PR in forked repository

lhotari commented Jan 23, 2025 • edited Loading

poorbarcode commented Jan 23, 2025

heesung-sn commented Jan 24, 2025

lhotari commented Jan 24, 2025 • edited Loading

lhotari left a comment

Choose a reason for hiding this comment

codecov-commenter commented Jan 27, 2025 • edited Loading

Codecov Report

lhotari commented Jan 27, 2025

lhotari commented Jan 23, 2025 •

edited

Loading

lhotari commented Jan 24, 2025 •

edited

Loading

codecov-commenter commented Jan 27, 2025 •

edited

Loading