Improve ASCII performance #568

sugmanue · 2025-03-09T04:41:36Z

Summary

Improve the ASCII case by creating a tight loop around it. All the changes follows a similar pattern. First attempt to do a tight loop around ASCII and fallback whenever a non-ascii char is found.

These changes shows improvements of up to 7x for the ASCII case, but also for the multi-byte code path.

The _2 cases are for the same sizes but without chunking. The CBOR was created using json2cbor to avoid chunking. All these benchmark test can be found here

Benchmark Results

All the benchmarks can be found here.

Benchmark                (flavor)      (size)  Mode  Cnt       Score       Error  Units
MyBenchmark.cbor  ASCII_PRINTABLE       SMALL  avgt    5     283.767 ±     6.141  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE       SMALL  avgt    5     276.760 ±     4.900  ns/op (after)
MyBenchmark.cbor  ASCII_PRINTABLE      MEDIUM  avgt    5     751.102 ±     9.724  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE      MEDIUM  avgt    5     412.084 ±     0.906  ns/op (after)
MyBenchmark.cbor  ASCII_PRINTABLE       LARGE  avgt    5    1162.698 ±    34.077  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE       LARGE  avgt    5     537.463 ±     1.371  ns/op (after)
MyBenchmark.cbor  ASCII_PRINTABLE     X_LARGE  avgt    5   97592.433 ±   652.295  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE     X_LARGE  avgt    5   12433.531 ±    55.798  ns/op (after)
MyBenchmark.cbor  ASCII_PRINTABLE    XX_LARGE  avgt    5  192964.487 ±   764.024  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE    XX_LARGE  avgt    5   23451.347 ±    57.090  ns/op (after)
MyBenchmark.cbor  ASCII_PRINTABLE  XX_LARGE_2  avgt    5  192113.905 ±  1270.924  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE  XX_LARGE_2  avgt    5   24381.351 ±   601.883  ns/op (after)
MyBenchmark.cbor            EMOJI       SMALL  avgt    5     352.329 ±     8.878  ns/op (before)
MyBenchmark.cbor            EMOJI       SMALL  avgt    5     369.720 ±     8.236  ns/op (after)
MyBenchmark.cbor            EMOJI      MEDIUM  avgt    5    1393.845 ±     9.093  ns/op (before)
MyBenchmark.cbor            EMOJI      MEDIUM  avgt    5    1477.070 ±    14.845  ns/op (after)
MyBenchmark.cbor            EMOJI       LARGE  avgt    5    2492.102 ±   145.094  ns/op (before)
MyBenchmark.cbor            EMOJI       LARGE  avgt    5    2634.623 ±    20.539  ns/op (after)
MyBenchmark.cbor            EMOJI     X_LARGE  avgt    5  313477.398 ±  3187.925  ns/op (before)
MyBenchmark.cbor            EMOJI     X_LARGE  avgt    5  309304.797 ± 10424.273  ns/op (after)
MyBenchmark.cbor            EMOJI    XX_LARGE  avgt    5  614833.426 ± 12688.680  ns/op (before)
MyBenchmark.cbor            EMOJI    XX_LARGE  avgt    5  409757.983 ±  5656.776  ns/op (after)
MyBenchmark.cbor            EMOJI  XX_LARGE_2  avgt    5  775988.821 ±  8871.677  ns/op (before)
MyBenchmark.cbor            EMOJI  XX_LARGE_2  avgt    5  381908.757 ±   855.804  ns/op (after)

cowtowncoder · 2025-03-10T23:50:26Z

@sugmanue qq: Which JDK(s) are results with?

cowtowncoder · 2025-03-10T23:52:36Z

@sugmanue Ok, sounds good; thank you for contributing this!

One thing to do before merging (although not blocking code review) that we eventually needs is CLA. It's here:

https://github.com/FasterXML/jackson/blob/master/contributor-agreement.pdf

and needs to be done just once before the first contribution (good for any number afterwards).

The usual way is to print, fill & sign, scan/photo, email to cla at fasterxml dot com.

Looking forward to getting this reviewed, merged!

hyandell · 2025-03-12T03:51:14Z

Hey @cowtowncoder - can you add @sugmanue to the CCLA Amazon already has with you?

(I think we usually cover this by email; let me know if you want me to follow up that way)

cowtowncoder · 2025-03-12T04:13:23Z

@hyandell No this is fine, added @sugmanue (been 3 years since last addition :) ).

Thanks!

sugmanue · 2025-03-12T15:44:51Z

@sugmanue qq: Which JDK(s) are results with?

$ java --version
openjdk 17.0.13 2024-10-15 LTS
OpenJDK Runtime Environment Corretto-17.0.13.11.1 (build 17.0.13+11-LTS)
OpenJDK 64-Bit Server VM Corretto-17.0.13.11.1 (build 17.0.13+11-LTS, mixed mode, sharing)
$ uname -a
Darwin c889f3b1daa2 24.3.0 Darwin Kernel Version 24.3.0: Thu Jan  2 20:24:16 PST 2025; root:xnu-11215.81.4~3/RELEASE_ARM64_T6000 arm64

I can test on JDK8 and others, let me know.

cowtowncoder · 2025-03-12T16:13:51Z

@sugmanue I think JDK 8 would be good: but if there's speed-up on 17, it seems likely 21 would see some too. But those (8, 21) are the ones to test if it's easy enough.

I hope to review this soon, and since we have CCLA we should be good to go once reviewed.

cowtowncoder · 2025-03-12T21:18:18Z

cbor/src/main/java/com/fasterxml/jackson/dataformat/cbor/CBORParser.java

+    /**
+     * A pointer to know where to write text when we share an output buffer across methods
+     */
+    protected int _sharedOutBufferPtr;


No, should not add this as state -- pointer should be passed along as needed (along with output buffer itself), if possible.

Yeah, I didn't like that either, the problem is that the method that make use of it need to return this value, the pointer to the current buffer (that can change inside the method is it's replace after being full), along side with whatever value needs to respond back to its caller (e.g., boolean for success).
Initially I created a small static class for this, e.g.,

static class OutBufState { char[] _outBuf; int _outBufPtr; }

If this seems better I can go back to that option. I like it better but wasn't sure about introducing a new class inside this one. Let me know.

Ah. Yes, that makes sense, was guessing there has to be a reason.

And I think adding class is bit more intrusive. Let me think about this a bit and see.

@sugmanue Ok, one other thought before giving up on this -- TextBuffer already has _currentSize to go with _currentSegment:

public int getCurrentSegmentSize() { return _currentSize; } public void setCurrentLength(int len) { _currentSize = len; }

so perhaps that could be used instead, to sync output pointer?

Sure, I will take a look. I tried with _inputStart which seemed a good candidate but it has different semantics. I will check and let you know. Thanks

cbor/src/main/java/com/fasterxml/jackson/dataformat/cbor/CBORParser.java

sugmanue · 2025-03-13T03:47:32Z

@sugmanue I think JDK 8 would be good: but if there's speed-up on 17, it seems likely 21 would see some too. But those (8, 21) are the ones to test if it's easy enough.

I hope to review this soon, and since we have CCLA we should be good to go once reviewed.

This one is for Java 8

java -version
openjdk version "1.8.0_432"
OpenJDK Runtime Environment Corretto-8.432.06.1 (build 1.8.0_432-b06)
OpenJDK 64-Bit Server VM Corretto-8.432.06.1 (build 25.432-b06, mixed mode)

Results

Benchmark                (flavor)      (size)  Mode  Cnt       Score       Error  Units
MyBenchmark.cbor  ASCII_PRINTABLE       SMALL  avgt    5     281.739 ±     5.083  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE       SMALL  avgt    5     272.057 ±     6.889  ns/op (after)
MyBenchmark.cbor  ASCII_PRINTABLE      MEDIUM  avgt    5     725.747 ±    13.185  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE      MEDIUM  avgt    5     452.181 ±     6.315  ns/op (after)
MyBenchmark.cbor  ASCII_PRINTABLE       LARGE  avgt    5    1077.382 ±    35.361  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE       LARGE  avgt    5     575.748 ±     5.510  ns/op (after)
MyBenchmark.cbor  ASCII_PRINTABLE     X_LARGE  avgt    5   52605.983 ±  1048.369  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE     X_LARGE  avgt    5   13688.755 ±   143.455  ns/op (after)
MyBenchmark.cbor  ASCII_PRINTABLE    XX_LARGE  avgt    5  181696.755 ±  3108.360  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE    XX_LARGE  avgt    5   24969.362 ±   361.995  ns/op (after)
MyBenchmark.cbor  ASCII_PRINTABLE  XX_LARGE_2  avgt    5  178847.373 ±  2207.483  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE  XX_LARGE_2  avgt    5   21627.922 ±   520.888  ns/op (after)
MyBenchmark.cbor            EMOJI       SMALL  avgt    5     386.993 ±     8.277  ns/op (before)
MyBenchmark.cbor            EMOJI       SMALL  avgt    5     399.116 ±    11.299  ns/op (after)
MyBenchmark.cbor            EMOJI      MEDIUM  avgt    5    1731.573 ±    11.454  ns/op (before)
MyBenchmark.cbor            EMOJI      MEDIUM  avgt    5    1749.517 ±    42.673  ns/op (after)
MyBenchmark.cbor            EMOJI       LARGE  avgt    5    3080.093 ±    39.328  ns/op (before)
MyBenchmark.cbor            EMOJI       LARGE  avgt    5    3138.279 ±    64.700  ns/op (after)
MyBenchmark.cbor            EMOJI     X_LARGE  avgt    5  193589.685 ±  2960.494  ns/op (before)
MyBenchmark.cbor            EMOJI     X_LARGE  avgt    5  192334.957 ±  4207.840  ns/op (after)
MyBenchmark.cbor            EMOJI    XX_LARGE  avgt    5  364793.365 ±  9109.322  ns/op (before)
MyBenchmark.cbor            EMOJI    XX_LARGE  avgt    5  369629.475 ±  5089.092  ns/op (after)
MyBenchmark.cbor            EMOJI  XX_LARGE_2  avgt    5  369833.290 ±  9019.027  ns/op (before)
MyBenchmark.cbor            EMOJI  XX_LARGE_2  avgt    5  375344.596 ± 11702.724  ns/op (after)

UPDATE (Java 21)

java -version
openjdk version "21.0.6" 2025-01-21 LTS
OpenJDK Runtime Environment Corretto-21.0.6.7.1 (build 21.0.6+7-LTS)
OpenJDK 64-Bit Server VM Corretto-21.0.6.7.1 (build 21.0.6+7-LTS, mixed mode, sharing)

Results

Benchmark                (flavor)      (size)  Mode  Cnt       Score      Error  Units
MyBenchmark.cbor  ASCII_PRINTABLE       SMALL  avgt    5     305.784 ±    7.608  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE       SMALL  avgt    5     295.670 ±    1.224  ns/op (after)
MyBenchmark.cbor  ASCII_PRINTABLE      MEDIUM  avgt    5     803.021 ±    9.585  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE      MEDIUM  avgt    5     438.636 ±    5.827  ns/op (after)
MyBenchmark.cbor  ASCII_PRINTABLE       LARGE  avgt    5    1203.027 ±   17.671  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE       LARGE  avgt    5     549.468 ±    6.743  ns/op (after)
MyBenchmark.cbor  ASCII_PRINTABLE     X_LARGE  avgt    5   55381.435 ±  822.252  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE     X_LARGE  avgt    5   13772.383 ±  119.884  ns/op (after)
MyBenchmark.cbor  ASCII_PRINTABLE    XX_LARGE  avgt    5  108639.201 ± 1172.755  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE    XX_LARGE  avgt    5   26644.873 ±  241.236  ns/op (after)
MyBenchmark.cbor  ASCII_PRINTABLE  XX_LARGE_2  avgt    5  111345.999 ±  312.178  ns/op (before)
MyBenchmark.cbor  ASCII_PRINTABLE  XX_LARGE_2  avgt    5   25099.956 ±  412.797  ns/op (after)
MyBenchmark.cbor            EMOJI       SMALL  avgt    5     384.375 ±    8.567  ns/op (before)
MyBenchmark.cbor            EMOJI       SMALL  avgt    5     393.031 ±    9.199  ns/op (after)
MyBenchmark.cbor            EMOJI      MEDIUM  avgt    5    1427.027 ±   31.548  ns/op (before)
MyBenchmark.cbor            EMOJI      MEDIUM  avgt    5    1478.020 ±  117.551  ns/op (after)
MyBenchmark.cbor            EMOJI       LARGE  avgt    5    2554.348 ±   40.087  ns/op (before)
MyBenchmark.cbor            EMOJI       LARGE  avgt    5    2531.857 ±   25.393  ns/op (after)
MyBenchmark.cbor            EMOJI     X_LARGE  avgt    5  175976.767 ± 2245.392  ns/op (before)
MyBenchmark.cbor            EMOJI     X_LARGE  avgt    5  175147.316 ± 1050.705  ns/op (after)
MyBenchmark.cbor            EMOJI    XX_LARGE  avgt    5  391175.946 ± 2884.534  ns/op (before)
MyBenchmark.cbor            EMOJI    XX_LARGE  avgt    5  346638.446 ± 5075.774  ns/op (after)
MyBenchmark.cbor            EMOJI  XX_LARGE_2  avgt    5  375395.869 ± 4271.745  ns/op (before)
MyBenchmark.cbor            EMOJI  XX_LARGE_2  avgt    5  384483.734 ± 4227.387  ns/op (after)

cowtowncoder · 2025-03-19T02:39:07Z

Ok this looks good. So, basically: good improvements to ASCII, esp. larger ones; no detrimental effect on tested non-ASCII.

sugmanue · 2025-03-19T20:28:10Z

Ok this looks good. So, basically: good improvements to ASCII, esp. larger ones; no detrimental effect on tested non-ASCII.

Yes, there's a another case that I'd think is worth optimizing for, mostly ASCII but few non-ASCII here and there, but didn't have time to look closely into how to do it, and if this works I can then send a follow up for that case if I find a good way for it.

Improve ASCII performance

2940ef9

sugmanue mentioned this pull request Mar 9, 2025

JSON vs CBOR performance for ASCII text #519

Open

sugmanue marked this pull request as ready for review March 9, 2025 05:06

cowtowncoder added the cla-received Marker to denote that there is a CLA for pr label Mar 12, 2025

cowtowncoder reviewed Mar 12, 2025

View reviewed changes

cbor/src/main/java/com/fasterxml/jackson/dataformat/cbor/CBORParser.java Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve ASCII performance #568

Improve ASCII performance #568

sugmanue commented Mar 9, 2025 •

edited

Loading

cowtowncoder commented Mar 10, 2025

cowtowncoder commented Mar 10, 2025

hyandell commented Mar 12, 2025

cowtowncoder commented Mar 12, 2025

sugmanue commented Mar 12, 2025

cowtowncoder commented Mar 12, 2025

cowtowncoder Mar 12, 2025 •

edited

Loading

sugmanue Mar 12, 2025

cowtowncoder Mar 13, 2025

cowtowncoder Mar 19, 2025

sugmanue Mar 19, 2025

sugmanue commented Mar 13, 2025 •

edited

Loading

cowtowncoder commented Mar 19, 2025

sugmanue commented Mar 19, 2025

Improve ASCII performance #568

Are you sure you want to change the base?

Improve ASCII performance #568

Conversation

sugmanue commented Mar 9, 2025 • edited Loading

Summary

Benchmark Results

cowtowncoder commented Mar 10, 2025

cowtowncoder commented Mar 10, 2025

hyandell commented Mar 12, 2025

cowtowncoder commented Mar 12, 2025

sugmanue commented Mar 12, 2025

cowtowncoder commented Mar 12, 2025

cowtowncoder Mar 12, 2025 • edited Loading

Choose a reason for hiding this comment

sugmanue Mar 12, 2025

Choose a reason for hiding this comment

cowtowncoder Mar 13, 2025

Choose a reason for hiding this comment

cowtowncoder Mar 19, 2025

Choose a reason for hiding this comment

sugmanue Mar 19, 2025

Choose a reason for hiding this comment

sugmanue commented Mar 13, 2025 • edited Loading

cowtowncoder commented Mar 19, 2025

sugmanue commented Mar 19, 2025

sugmanue commented Mar 9, 2025 •

edited

Loading

cowtowncoder Mar 12, 2025 •

edited

Loading

sugmanue commented Mar 13, 2025 •

edited

Loading