Skip to content

Conversation

Meet0861
Copy link

@Meet0861 Meet0861 commented Sep 18, 2025

Fixes: #24754

Motivation

The topK bundle selection fails due to the sorting failure in the partition sort algo being used here.
Partition sort uses, NamespaceBundleStats' + BundleData's custom ccompararable implementations where it checks throughput, connections, cache size etc against the defined threshold and when values are within the defined thresholds, they are considered "equal" (return 0), but this creates transitivity violations.
It violates the transitivity property required by Java's Comparable interface, causing Collections.sort() to potentially throw IllegalArgumentException with "Comparison method violates its general contract" error.
Error Log:
2025-09-12T22:29:58.832386492+05:30 16:59:58.832 [pulsar-load-manager-1-1] WARN org.apache.pulsar.broker.loadbalance.LoadResourceQuotaUpdaterTask - Error write resource quota - java.lang.IllegalArgumentException: Comparison method violates its general contract

Due to this failure, the job - writeBundleDataOnZooKeeper(leader broker writes bundle data aggregated from all brokers to metadata store) in modularLoadManager may fail link and can cause degradation in productions due to failure/inconsistencies in LB decisions

Modifications

added strict NamespaceBundleStats comparator and BundleData comparator to be used for sorting for topk bundle comparison which is being used by ModularLoadManager and ExtensibleLoadManager

Verifying this change

  • Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:

  • Added unit tests to validate the fix

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: https://github.com/Meet0861/pulsar/tree/fix-transitivityViolationInNamespaceBundleStatsAndBundledataComparator

Copy link

@Meet0861 Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

@github-actions github-actions bot added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing labels Sep 18, 2025
@lhotari lhotari changed the base branch from branch-4.0 to master September 18, 2025 11:41
@lhotari
Copy link
Member

lhotari commented Sep 18, 2025

@Meet0861 Pull requests should target the master branch. I changed the base, but that messed up things. You can rebase on your side and do a force-push. One way is to do this is:

git fetch origin # assuming "origin" is apache/pulsar
git reset --hard origin/master
git cherry-pick 8ce1fc4 # your PR commit

then push changes to the PR branch

@Meet0861 Meet0861 force-pushed the fix-transitivityViolationInNamespaceBundleStatsAndBundledataComparator branch from 8ce1fc4 to af19a10 Compare September 22, 2025 12:40
@lhotari
Copy link
Member

lhotari commented Sep 23, 2025

@Meet0861 Thanks for the effort of debugging and drilling down to the root cause.
Since the compareTo contracts were violated in NamespaceBundleStats, TimeAverageMessageData and ResourceUnitRanking classes, I have addressed those in an alternative approach to address the issue, #24772. It wouldn't have been useful to keep the invalid implementations since they could have caused similar problems in the future when sorting would have been used. The main purpose of compareTo methods is sorting.
The original reason for the Math.abs was to compare with a coarse granularity (or resolution). I have addressed that without Math.abs in the PR so that the value is divided by the comparison resolution/granularity at comparison. The behavior isn't exactly the same as Math.abs, but it does retain the behavior of comparing at the desired level of granularity.
Since the compareTo methods are now fixed with #24772, it won't be necessary to have custom comparators that this PR provides.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-not-needed Your PR changes do not impact docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Failure in topK bundle selection due to the transitivity Contract Violation by namespaceBundleStats' & BundleData's compareTo()
3 participants