Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ASCII performance #568

Open
wants to merge 1 commit into
base: 2.19
Choose a base branch
from

Conversation

sugmanue
Copy link

@sugmanue sugmanue commented Mar 9, 2025

Summary

Improve the ASCII case by creating a tight loop around it. All the changes follows a similar pattern. First attempt to do a tight loop around ASCII and fallback whenever a non-ascii char is found.

These changes shows improvements of up to 7x for the ASCII case, but also for the multi-byte code path.

The _2 cases are for the same sizes but without chunking. The CBOR was created using json2cbor to avoid chunking. All these benchmark test can be found here

Benchmark Results

All the benchmarks can be found here.

Benchmark                (flavor)      (size)  Mode  Cnt       Score       Error  Units
MyBenchmark.cbor  ASCII_PRINTABLE       SMALL  avgt    5     283.767 ±     6.141  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE       SMALL  avgt    5     276.760 ±     4.900  ns/op (after)
MyBenchmark.cbor  ASCII_PRINTABLE      MEDIUM  avgt    5     751.102 ±     9.724  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE      MEDIUM  avgt    5     412.084 ±     0.906  ns/op (after)
MyBenchmark.cbor  ASCII_PRINTABLE       LARGE  avgt    5    1162.698 ±    34.077  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE       LARGE  avgt    5     537.463 ±     1.371  ns/op (after)
MyBenchmark.cbor  ASCII_PRINTABLE     X_LARGE  avgt    5   97592.433 ±   652.295  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE     X_LARGE  avgt    5   12433.531 ±    55.798  ns/op (after)
MyBenchmark.cbor  ASCII_PRINTABLE    XX_LARGE  avgt    5  192964.487 ±   764.024  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE    XX_LARGE  avgt    5   23451.347 ±    57.090  ns/op (after)
MyBenchmark.cbor  ASCII_PRINTABLE  XX_LARGE_2  avgt    5  192113.905 ±  1270.924  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE  XX_LARGE_2  avgt    5   24381.351 ±   601.883  ns/op (after)
MyBenchmark.cbor            EMOJI       SMALL  avgt    5     352.329 ±     8.878  ns/op (before)
MyBenchmark.cbor            EMOJI       SMALL  avgt    5     369.720 ±     8.236  ns/op (after)
MyBenchmark.cbor            EMOJI      MEDIUM  avgt    5    1393.845 ±     9.093  ns/op (before)
MyBenchmark.cbor            EMOJI      MEDIUM  avgt    5    1477.070 ±    14.845  ns/op (after)
MyBenchmark.cbor            EMOJI       LARGE  avgt    5    2492.102 ±   145.094  ns/op (before)
MyBenchmark.cbor            EMOJI       LARGE  avgt    5    2634.623 ±    20.539  ns/op (after)
MyBenchmark.cbor            EMOJI     X_LARGE  avgt    5  313477.398 ±  3187.925  ns/op (before)
MyBenchmark.cbor            EMOJI     X_LARGE  avgt    5  309304.797 ± 10424.273  ns/op (after)
MyBenchmark.cbor            EMOJI    XX_LARGE  avgt    5  614833.426 ± 12688.680  ns/op (before)
MyBenchmark.cbor            EMOJI    XX_LARGE  avgt    5  409757.983 ±  5656.776  ns/op (after)
MyBenchmark.cbor            EMOJI  XX_LARGE_2  avgt    5  775988.821 ±  8871.677  ns/op (before)
MyBenchmark.cbor            EMOJI  XX_LARGE_2  avgt    5  381908.757 ±   855.804  ns/op (after)

@sugmanue sugmanue marked this pull request as ready for review March 9, 2025 05:06
@cowtowncoder
Copy link
Member

@sugmanue qq: Which JDK(s) are results with?

@cowtowncoder
Copy link
Member

@sugmanue Ok, sounds good; thank you for contributing this!

One thing to do before merging (although not blocking code review) that we eventually needs is CLA. It's here:

https://github.com/FasterXML/jackson/blob/master/contributor-agreement.pdf

and needs to be done just once before the first contribution (good for any number afterwards).

The usual way is to print, fill & sign, scan/photo, email to cla at fasterxml dot com.

Looking forward to getting this reviewed, merged!

@hyandell
Copy link

Hey @cowtowncoder - can you add @sugmanue to the CCLA Amazon already has with you?

(I think we usually cover this by email; let me know if you want me to follow up that way)

@cowtowncoder
Copy link
Member

@hyandell No this is fine, added @sugmanue (been 3 years since last addition :) ).

Thanks!

@cowtowncoder cowtowncoder added the cla-received Marker to denote that there is a CLA for pr label Mar 12, 2025
@sugmanue
Copy link
Author

@sugmanue qq: Which JDK(s) are results with?

$ java --version
openjdk 17.0.13 2024-10-15 LTS
OpenJDK Runtime Environment Corretto-17.0.13.11.1 (build 17.0.13+11-LTS)
OpenJDK 64-Bit Server VM Corretto-17.0.13.11.1 (build 17.0.13+11-LTS, mixed mode, sharing)
$ uname -a
Darwin c889f3b1daa2 24.3.0 Darwin Kernel Version 24.3.0: Thu Jan  2 20:24:16 PST 2025; root:xnu-11215.81.4~3/RELEASE_ARM64_T6000 arm64

I can test on JDK8 and others, let me know.

@cowtowncoder
Copy link
Member

@sugmanue I think JDK 8 would be good: but if there's speed-up on 17, it seems likely 21 would see some too. But those (8, 21) are the ones to test if it's easy enough.

I hope to review this soon, and since we have CCLA we should be good to go once reviewed.

/**
* A pointer to know where to write text when we share an output buffer across methods
*/
protected int _sharedOutBufferPtr;
Copy link
Member

@cowtowncoder cowtowncoder Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, should not add this as state -- pointer should be passed along as needed (along with output buffer itself), if possible.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I didn't like that either, the problem is that the method that make use of it need to return this value, the pointer to the current buffer (that can change inside the method is it's replace after being full), along side with whatever value needs to respond back to its caller (e.g., boolean for success).
Initially I created a small static class for this, e.g.,

static class OutBufState {
   char[] _outBuf;
   int _outBufPtr;
}

If this seems better I can go back to that option. I like it better but wasn't sure about introducing a new class inside this one. Let me know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Yes, that makes sense, was guessing there has to be a reason.

And I think adding class is bit more intrusive. Let me think about this a bit and see.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sugmanue Ok, one other thought before giving up on this -- TextBuffer already has _currentSize to go with _currentSegment:

    public int getCurrentSegmentSize() { return _currentSize; }
    public void setCurrentLength(int len) { _currentSize = len; }

so perhaps that could be used instead, to sync output pointer?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will take a look. I tried with _inputStart which seemed a good candidate but it has different semantics. I will check and let you know. Thanks

@sugmanue
Copy link
Author

sugmanue commented Mar 13, 2025

@sugmanue I think JDK 8 would be good: but if there's speed-up on 17, it seems likely 21 would see some too. But those (8, 21) are the ones to test if it's easy enough.

I hope to review this soon, and since we have CCLA we should be good to go once reviewed.

This one is for Java 8

java -version
openjdk version "1.8.0_432"
OpenJDK Runtime Environment Corretto-8.432.06.1 (build 1.8.0_432-b06)
OpenJDK 64-Bit Server VM Corretto-8.432.06.1 (build 25.432-b06, mixed mode)

Results

Benchmark                (flavor)      (size)  Mode  Cnt       Score       Error  Units
MyBenchmark.cbor  ASCII_PRINTABLE       SMALL  avgt    5     281.739 ±     5.083  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE       SMALL  avgt    5     272.057 ±     6.889  ns/op (after)
MyBenchmark.cbor  ASCII_PRINTABLE      MEDIUM  avgt    5     725.747 ±    13.185  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE      MEDIUM  avgt    5     452.181 ±     6.315  ns/op (after)
MyBenchmark.cbor  ASCII_PRINTABLE       LARGE  avgt    5    1077.382 ±    35.361  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE       LARGE  avgt    5     575.748 ±     5.510  ns/op (after)
MyBenchmark.cbor  ASCII_PRINTABLE     X_LARGE  avgt    5   52605.983 ±  1048.369  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE     X_LARGE  avgt    5   13688.755 ±   143.455  ns/op (after)
MyBenchmark.cbor  ASCII_PRINTABLE    XX_LARGE  avgt    5  181696.755 ±  3108.360  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE    XX_LARGE  avgt    5   24969.362 ±   361.995  ns/op (after)
MyBenchmark.cbor  ASCII_PRINTABLE  XX_LARGE_2  avgt    5  178847.373 ±  2207.483  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE  XX_LARGE_2  avgt    5   21627.922 ±   520.888  ns/op (after)
MyBenchmark.cbor            EMOJI       SMALL  avgt    5     386.993 ±     8.277  ns/op (before)
MyBenchmark.cbor            EMOJI       SMALL  avgt    5     399.116 ±    11.299  ns/op (after)
MyBenchmark.cbor            EMOJI      MEDIUM  avgt    5    1731.573 ±    11.454  ns/op (before)
MyBenchmark.cbor            EMOJI      MEDIUM  avgt    5    1749.517 ±    42.673  ns/op (after)
MyBenchmark.cbor            EMOJI       LARGE  avgt    5    3080.093 ±    39.328  ns/op (before)
MyBenchmark.cbor            EMOJI       LARGE  avgt    5    3138.279 ±    64.700  ns/op (after)
MyBenchmark.cbor            EMOJI     X_LARGE  avgt    5  193589.685 ±  2960.494  ns/op (before)
MyBenchmark.cbor            EMOJI     X_LARGE  avgt    5  192334.957 ±  4207.840  ns/op (after)
MyBenchmark.cbor            EMOJI    XX_LARGE  avgt    5  364793.365 ±  9109.322  ns/op (before)
MyBenchmark.cbor            EMOJI    XX_LARGE  avgt    5  369629.475 ±  5089.092  ns/op (after)
MyBenchmark.cbor            EMOJI  XX_LARGE_2  avgt    5  369833.290 ±  9019.027  ns/op (before)
MyBenchmark.cbor            EMOJI  XX_LARGE_2  avgt    5  375344.596 ± 11702.724  ns/op (after)

UPDATE (Java 21)

java -version
openjdk version "21.0.6" 2025-01-21 LTS
OpenJDK Runtime Environment Corretto-21.0.6.7.1 (build 21.0.6+7-LTS)
OpenJDK 64-Bit Server VM Corretto-21.0.6.7.1 (build 21.0.6+7-LTS, mixed mode, sharing)

Results

Benchmark                (flavor)      (size)  Mode  Cnt       Score      Error  Units
MyBenchmark.cbor  ASCII_PRINTABLE       SMALL  avgt    5     305.784 ±    7.608  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE       SMALL  avgt    5     295.670 ±    1.224  ns/op (after)
MyBenchmark.cbor  ASCII_PRINTABLE      MEDIUM  avgt    5     803.021 ±    9.585  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE      MEDIUM  avgt    5     438.636 ±    5.827  ns/op (after)
MyBenchmark.cbor  ASCII_PRINTABLE       LARGE  avgt    5    1203.027 ±   17.671  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE       LARGE  avgt    5     549.468 ±    6.743  ns/op (after)
MyBenchmark.cbor  ASCII_PRINTABLE     X_LARGE  avgt    5   55381.435 ±  822.252  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE     X_LARGE  avgt    5   13772.383 ±  119.884  ns/op (after)
MyBenchmark.cbor  ASCII_PRINTABLE    XX_LARGE  avgt    5  108639.201 ± 1172.755  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE    XX_LARGE  avgt    5   26644.873 ±  241.236  ns/op (after)
MyBenchmark.cbor  ASCII_PRINTABLE  XX_LARGE_2  avgt    5  111345.999 ±  312.178  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE  XX_LARGE_2  avgt    5   25099.956 ±  412.797  ns/op (after)
MyBenchmark.cbor            EMOJI       SMALL  avgt    5     384.375 ±    8.567  ns/op (before)
MyBenchmark.cbor            EMOJI       SMALL  avgt    5     393.031 ±    9.199  ns/op (after)
MyBenchmark.cbor            EMOJI      MEDIUM  avgt    5    1427.027 ±   31.548  ns/op (before)
MyBenchmark.cbor            EMOJI      MEDIUM  avgt    5    1478.020 ±  117.551  ns/op (after)
MyBenchmark.cbor            EMOJI       LARGE  avgt    5    2554.348 ±   40.087  ns/op (before)
MyBenchmark.cbor            EMOJI       LARGE  avgt    5    2531.857 ±   25.393  ns/op (after)
MyBenchmark.cbor            EMOJI     X_LARGE  avgt    5  175976.767 ± 2245.392  ns/op (before)
MyBenchmark.cbor            EMOJI     X_LARGE  avgt    5  175147.316 ± 1050.705  ns/op (after)
MyBenchmark.cbor            EMOJI    XX_LARGE  avgt    5  391175.946 ± 2884.534  ns/op (before)
MyBenchmark.cbor            EMOJI    XX_LARGE  avgt    5  346638.446 ± 5075.774  ns/op (after)
MyBenchmark.cbor            EMOJI  XX_LARGE_2  avgt    5  375395.869 ± 4271.745  ns/op (before)
MyBenchmark.cbor            EMOJI  XX_LARGE_2  avgt    5  384483.734 ± 4227.387  ns/op (after)

@cowtowncoder
Copy link
Member

Ok this looks good. So, basically: good improvements to ASCII, esp. larger ones; no detrimental effect on tested non-ASCII.

@sugmanue
Copy link
Author

Ok this looks good. So, basically: good improvements to ASCII, esp. larger ones; no detrimental effect on tested non-ASCII.

Yes, there's a another case that I'd think is worth optimizing for, mostly ASCII but few non-ASCII here and there, but didn't have time to look closely into how to do it, and if this works I can then send a follow up for that case if I find a good way for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-received Marker to denote that there is a CLA for pr
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants