-
-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve ASCII performance #568
base: 2.19
Are you sure you want to change the base?
Improve ASCII performance #568
Conversation
@sugmanue qq: Which JDK(s) are results with? |
@sugmanue Ok, sounds good; thank you for contributing this! One thing to do before merging (although not blocking code review) that we eventually needs is CLA. It's here: https://github.com/FasterXML/jackson/blob/master/contributor-agreement.pdf and needs to be done just once before the first contribution (good for any number afterwards). The usual way is to print, fill & sign, scan/photo, email to Looking forward to getting this reviewed, merged! |
Hey @cowtowncoder - can you add @sugmanue to the CCLA Amazon already has with you? (I think we usually cover this by email; let me know if you want me to follow up that way) |
I can test on JDK8 and others, let me know. |
@sugmanue I think JDK 8 would be good: but if there's speed-up on 17, it seems likely 21 would see some too. But those (8, 21) are the ones to test if it's easy enough. I hope to review this soon, and since we have CCLA we should be good to go once reviewed. |
/** | ||
* A pointer to know where to write text when we share an output buffer across methods | ||
*/ | ||
protected int _sharedOutBufferPtr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, should not add this as state -- pointer should be passed along as needed (along with output buffer itself), if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I didn't like that either, the problem is that the method that make use of it need to return this value, the pointer to the current buffer (that can change inside the method is it's replace after being full), along side with whatever value needs to respond back to its caller (e.g., boolean for success).
Initially I created a small static class for this, e.g.,
static class OutBufState {
char[] _outBuf;
int _outBufPtr;
}
If this seems better I can go back to that option. I like it better but wasn't sure about introducing a new class inside this one. Let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. Yes, that makes sense, was guessing there has to be a reason.
And I think adding class is bit more intrusive. Let me think about this a bit and see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sugmanue Ok, one other thought before giving up on this -- TextBuffer
already has _currentSize
to go with _currentSegment
:
public int getCurrentSegmentSize() { return _currentSize; }
public void setCurrentLength(int len) { _currentSize = len; }
so perhaps that could be used instead, to sync output pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will take a look. I tried with _inputStart
which seemed a good candidate but it has different semantics. I will check and let you know. Thanks
cbor/src/main/java/com/fasterxml/jackson/dataformat/cbor/CBORParser.java
Show resolved
Hide resolved
This one is for Java 8
Results
UPDATE (Java 21)
Results
|
Ok this looks good. So, basically: good improvements to ASCII, esp. larger ones; no detrimental effect on tested non-ASCII. |
Yes, there's a another case that I'd think is worth optimizing for, mostly ASCII but few non-ASCII here and there, but didn't have time to look closely into how to do it, and if this works I can then send a follow up for that case if I find a good way for it. |
Summary
Improve the ASCII case by creating a tight loop around it. All the changes follows a similar pattern. First attempt to do a tight loop around ASCII and fallback whenever a non-ascii char is found.
These changes shows improvements of up to 7x for the ASCII case, but also for the multi-byte code path.
The
_2
cases are for the same sizes but without chunking. The CBOR was created using json2cbor to avoid chunking. All these benchmark test can be found hereBenchmark Results
All the benchmarks can be found here.