-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster streaming 'jsonl' parser #3829
Faster streaming 'jsonl' parser #3829
Conversation
nice! I would expect the non-streaming version to be faster for small files like the one you benchmarked on, and would have advocated that as the default for small files. For completeness, can you try on some larger files like https://github.com/codeql-dca-runners/codeql-dca-worker_javascript/actions/runs/11840886420/artifacts/2188663427 (amphtml on https://github.com/github/codeql-dca-main/issues/24803) (you need to run |
(on second thought, that will be a lot of work, so don't worry about it) |
I ran it anyway:
|
Excellent. That is indeed convincing! |
Wow. This is great. Is this something you want to contribute to the main branch? Also, do you have a hypothesis why |
My two guesses:
|
Yes the intent is to get this into main along with the 'Compare Performance' UI. But for benchmarking purposes it was easier to target the hackathon branch initially since that's where the previous streaming implementation lived. |
Although nobody formally 'Approved' the PR I'm gonna merge to the hackathon branch so I can start preparing the PR against main. |
Replaces the streaming JSONL parser with a faster one that doesn't call
readline
. The streaming JSONL parser on our branch was slower than the original sync version (but doesn't run out of memory), and it seemsreadline
is a bottleneck:Running the benchmark on a 21 MB logfile:
readJsonlReferenceImpl
: 172.4 ms (original non-streaming version)readJsonlFile
: 283.3 ms (streaming version based onreadline
)readJsonlFile2
: 151.3 ms (🚀 new version withoutreadline
)justReadline
: 187.5 ms (consumes the file withreadline
and nothing else)Note: targeting the hackathon branch. I think it's best to polish that up and try to merge it into
main
.