Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] [broker] reader stuck with marker message though hasMessageAvailable() return true #21951

Closed
wants to merge 13 commits into from

Conversation

thetumbled
Copy link
Member

@thetumbled thetumbled commented Jan 23, 2024

Motivation

#21718 fix a problem that the compaction reader stuck when meet a marker message, such as transaction marker. The reason why the reader stuck is that hasMessageAvailable() rely on the getLastMessageId, which return the MessageId m1 of the marker message, but the server will filter marker message transparently, the reader need to read to m1 until the compaction end, but the server will not dispatch the marker message(m1) to client, so the compaction reader stuck.

The solution #21718 take is do not filter such message when the subscription name is __compaction, and transfer the responsibility of filtering to user(compaction reader), so that the readNext and hasMessageAvailable is consistent.

But, such patch could only fix the problem of compaction, normal users may use such kind of logic to do something.

Modifications

The root cause of such kind of stuck problem is that, getLastMessageId do not validate message completely, though it has done some kind of check like maxReadPosition.
We need to refactor getLastMessageId to ensure that message corresponding to the id it return is valid.

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change added tests and can be verified as follows:
(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: thetumbled#34

@codelipenghui codelipenghui self-requested a review January 23, 2024 03:13
* @param lastPosition
* @param ml
*/
private void readLastValidEntry(CompletableFuture<MessageMetadata> entryMetaDataFuture,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will search forward one by one until a valid position is found, not sure about the performance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is an aborted txn with lots of message to be discarded, the search may be time consuming. But time-consuming is better than stucking, right?

@thetumbled
Copy link
Member Author

As it may impact performance and this corner case may be rare, we close it.
If anyone meet such corner case, maybe you can use this patch to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants