-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix][txn] fix concurrent error cause txn stuck in TransactionBufferHandlerImpl#endTxn #23551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][txn] fix concurrent error cause txn stuck in TransactionBufferHandlerImpl#endTxn #23551
Conversation
|
@codelipenghui @congbobo184 Can you help review this pr? |
...main/java/org/apache/pulsar/broker/transaction/buffer/impl/TransactionBufferHandlerImpl.java
Outdated
Show resolved
Hide resolved
b679478 to
e5428b9
Compare
lhotari
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
congbobo184
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #23551 +/- ##
=============================================
+ Coverage 38.56% 74.29% +35.73%
- Complexity 13262 33920 +20658
=============================================
Files 1856 1913 +57
Lines 145287 149503 +4216
Branches 16877 17372 +495
=============================================
+ Hits 56025 111074 +55049
+ Misses 81696 29582 -52114
- Partials 7566 8847 +1281
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
...main/java/org/apache/pulsar/broker/transaction/buffer/impl/TransactionBufferHandlerImpl.java
Show resolved
Hide resolved
…andlerImpl#endTxn (#23551) Co-authored-by: fanjianye <[email protected]> (cherry picked from commit c4f125c)
…andlerImpl#endTxn (#23551) Co-authored-by: fanjianye <[email protected]> (cherry picked from commit c4f125c)
…andlerImpl#endTxn (#23551) Co-authored-by: fanjianye <[email protected]> (cherry picked from commit c4f125c)
…andlerImpl#endTxn (#23551) Co-authored-by: fanjianye <[email protected]> (cherry picked from commit c4f125c)
…andlerImpl#endTxn (apache#23551) Co-authored-by: fanjianye <[email protected]> (cherry picked from commit c4f125c) (cherry picked from commit 74931c9)
…andlerImpl#endTxn (apache#23551) Co-authored-by: fanjianye <[email protected]> (cherry picked from commit c4f125c) (cherry picked from commit 74931c9)
…andlerImpl#endTxn (apache#23551) Co-authored-by: fanjianye <[email protected]> (cherry picked from commit c4f125c)
Fixes #23550
Motivation
After diving into the code, finding that there is a concurrent error in TransactionBufferHandlerImpl#checkRequestCredits(), checkPendingRequests(), which would cause the above issue.
Currently, we have config TransactionBufferClientMaxConcurrentRequests to control the concurrent request number. However, if the request and response is executed as follow, the request would permanently stuck in queue.
(to simplify the case, let's set permit is 1)
Now we can find there is no response can trigger pendingRequest.remove, and then all the new requests just add to pendingRequest but permanently not execute.
Modifications
The root reason is currently only onResponse() can trigger pendingRequest.remove. But when we execute onResponse(), the requestOp may not have been added to pendingRequest.
It is hard to add test for this concurrent case.
Verifying this change
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: