Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8345668: ZoneOffset.ofTotalSeconds performance regression #22854

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

naotoj
Copy link
Member

@naotoj naotoj commented Dec 20, 2024

The change made in JDK-8288723 seems innocuous, but it caused this performance regression. Partially reverting the change (ones that involve computeIfAbsent()) to the original. Provided a benchmark that iterates the call to ZoneOffset.ofTotalSeconds(0) 1,000 times, which improves the operation time from 3,946ns to 2,241ns.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8345668: ZoneOffset.ofTotalSeconds performance regression (Bug - P3)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/22854/head:pull/22854
$ git checkout pull/22854

Update a local copy of the PR:
$ git checkout pull/22854
$ git pull https://git.openjdk.org/jdk.git pull/22854/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 22854

View PR using the GUI difftool:
$ git pr show -t 22854

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/22854.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 20, 2024

👋 Welcome back naoto! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Dec 20, 2024

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 20, 2024
@openjdk
Copy link

openjdk bot commented Dec 20, 2024

@naotoj The following labels will be automatically applied to this pull request:

  • core-libs
  • i18n

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@mlbridge
Copy link

mlbridge bot commented Dec 20, 2024

Webrevs

return SECONDS_CACHE.computeIfAbsent(totalSeconds, totalSecs -> {
ZoneOffset result = new ZoneOffset(totalSecs);
Integer totalSecs = totalSeconds;
ZoneOffset result = SECONDS_CACHE.get(totalSecs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, each call may allocate an Integer object. The maximum number of ZoneOffsets that need to be cached here is only 148. Using AtomicReferenceArray is better than AtomicConcurrentHashMap.

Copy link
Contributor

@wenshao wenshao Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example:

static final AtomicReferenceArray<ZoneOffset> MINUTES_15_CACHE = new AtomicReferenceArray<>(37 * 4);

    public static ZoneOffset ofTotalSeconds(int totalSeconds) {
        // ...
        int minutes15Rem = totalSeconds / (15 * SECONDS_PER_MINUTE);
        if (totalSeconds - minutes15Rem * 15 * SECONDS_PER_MINUTE == 0) {
            int cacheIndex = minutes15Rem + 18 * 4;
            ZoneOffset result = MINUTES_15_CACHE.get(cacheIndex);
            if (result == null) {
                result = new ZoneOffset(totalSeconds);
                if (!MINUTES_15_CACHE.compareAndSet(cacheIndex, null, result)) {
                    result = MINUTES_15_CACHE.get(minutes15Rem);
                }
            }
            return result;
        }
       // ...
    }

Copy link
Member Author

@naotoj naotoj Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Shaojin,
Thanks for the suggestion, but I am not planning to improve the code more than backing out the offending fix at this time. (btw, cache size would be 149 as 18:00 and -18:00 are inclusive)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I submit a PR to make this improvement?

Copy link
Member

@liach liach Dec 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wenshao I agree with your proposal. Also for this part:

ZoneOffset result = MINUTES_15_CACHE.get(cacheIndex);
if (result == null) {
    result = new ZoneOffset(totalSeconds);
    if (!MINUTES_15_CACHE.compareAndSet(cacheIndex, null, result)) {
        result = MINUTES_15_CACHE.get(minutes15Rem);
    }
}

I recommend a rewrite:

ZoneOffset result = MINUTES_15_CACHE.getPlain(cacheIndex);
if (result == null) {
    result = new ZoneOffset(totalSeconds);
    ZoneOffset existing = MINUTES_15_CACHE.compareAndExchange(cacheIndex, null, result);
    return existing == null ? result : existing;
}

The getPlain is safe because ZoneOffset is thread safe, so you can use the object when you can observe a ZoneOffset object reference. Also compareAndExchange avoids extra operations if we failed to racily set the computed ZoneOffset.

@Benchmark
public void ofTotalSeconds() {
for (int i = 0; i < 1_000; i++) {
ZoneOffset.ofTotalSeconds(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This benchmark method should accept a Blackhole, and the return value of ofTotalSeconds must be sent to the Blackhole.consume method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This benchmark currently works probably because the cache interactions in ofTotalSeconds, which means JIT compilation cannot prove it is side-effect free. Had it been as simple as a decimal computation or if the cache becomes a stable map, JIT compilation can eliminate the static factory method call entirely, and the benchmark would be measuring the performance of no-op invocation.

@liach
Copy link
Member

liach commented Dec 21, 2024

The putIfAbsent remark from Roger Riggs applies to DateTimeTextProvider and DecimalStyle too. I think reusing existing result in these two places is beneficial, as the replaced computeIfAbsent returns the same object identity which may be helpful for quick equals comparisons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

4 participants