Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] [broker] fix incorrect delete sub when lastActive do not update. #21692

Closed

Conversation

thetumbled
Copy link
Member

@thetumbled thetumbled commented Dec 8, 2023

Motivation

PR #17573 has fixed cases that consumer commit offset periodically. It update lastActive field of cursor when consumer commit the offset.
But there is still corner case that the consumer do not commit the offset for any reason. For example, there is no content in the topic could be read, so the consumer do not commit the offset reasonably. In such case we has no reason delete the subscription.
Anyway, we should be cautious with the delete operation, which could result into data duplication or loss.

Modifications

Update the lastActive field when check the inactive sub.

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: thetumbled#31

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Dec 8, 2023
@thetumbled thetumbled changed the title [Fix] [Broker] fix incorrect delete sub when lastActive do not update. [fix] [broker] fix incorrect delete sub when lastActive do not update. Dec 8, 2023
Copy link
Contributor

@poorbarcode poorbarcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thetumbled Could you add a test for this case? Thanks

Copy link
Contributor

@Technoboy- Technoboy- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we update the cursor when every check ?

@thetumbled
Copy link
Member Author

thetumbled commented Dec 14, 2023

why do we update the cursor when every check ?

If the consumer do not commit the offset for any reason, the lastActive will never be updated, which will result into incorrect subscription deletion.
As we know that the consumer is connected, we should update lastActive when doing check.

@Technoboy-
Copy link
Contributor

why do we update the cursor when every check ?

If the consumer do not commit the offset for any reason, the lastActive will never be updated, which will result into incorrect subscription deletion. As we know that the consumer is connected, we should update lastActive when doing check.

in this case, the consumer is connected, how could the code run into sub.delete

if (System.currentTimeMillis() - sub.cursor.getLastActive() > expirationTimeMillis) {
sub.delete().thenAccept(v -> log.info("[{}][{}] The subscription was deleted due to expiration "
+ "with last active [{}]", topic, subName, sub.cursor.getLastActive()));
}

@thetumbled
Copy link
Member Author

why do we update the cursor when every check ?

If the consumer do not commit the offset for any reason, the lastActive will never be updated, which will result into incorrect subscription deletion. As we know that the consumer is connected, we should update lastActive when doing check.

in this case, the consumer is connected, how could the code run into sub.delete

if (System.currentTimeMillis() - sub.cursor.getLastActive() > expirationTimeMillis) {
sub.delete().thenAccept(v -> log.info("[{}][{}] The subscription was deleted due to expiration "
+ "with last active [{}]", topic, subName, sub.cursor.getLastActive()));
}

If the broker1 shutdown ungracefully, the topic will be loaded on broker2 without update the lastActive field to zk, and the broker2 load the topic with old lastActive field from zk. If broker2 check the inactive sub before any consumer connect to this sub, the sub will be deleted.
This problem is simillar to #17573.

@Technoboy- Technoboy- added this to the 3.3.0 milestone Dec 22, 2023
@coderzc coderzc modified the milestones: 3.3.0, 3.4.0 May 8, 2024
@thetumbled
Copy link
Member Author

@thetumbled Could you add a test for this case? Thanks

I push a new commit to add the test code, PTAL, thanks.

@thetumbled
Copy link
Member Author

thetumbled commented May 28, 2024

PersistentSubscription sub = persistentTopic.getSubscription("sub1");

// shutdown pulsar ungracefully
// disable the updateLastActive method to simulate the ungraceful shutdown
Copy link
Contributor

@poorbarcode poorbarcode May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want to mock an ungraceful shutdown, but the exact behavior you mocked is below:

  • the result of cursor.updateLastActive() will not be persisted.
  • the result of sub.cursor.updateLastActive() you added at this line will be persisted.

This mock is wrong.

Copy link
Member Author

@thetumbled thetumbled May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't ensure that the updated timestamp is persisted in crash time. With this PR, we can fix most of the cases, as we update it every time broker do expiration check.
But strictly speaking, we can't fix all the cases. If we always fail to persist the cursor info, the incorrect sub deletion occurs too.
Incorrect deletion of subscription causes serious consequence, such as duplication or data lost. In my opinion, missing deletions are better than incorrect deletion, this is what PR #22794 try to do.


// wait for 1min, but consumer is still connected all the time.
// so subscription should not be deleted.
Thread.sleep(60000);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sleep too long

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, but the minimum expiration time is 1min, we have to wait for 1min to trigger the bug.

Copy link
Contributor

@liangyepianzhou liangyepianzhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update last active when a consumer disconnect instead of every check.

@thetumbled
Copy link
Member Author

closed as #22794 has been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-not-needed Your PR changes do not impact docs release/2.10.7 release/2.11.5 release/3.0.6 release/3.3.1 type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants