Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser improvements #5

Merged
merged 10 commits into from
Jun 16, 2023
Merged

Conversation

fabi321
Copy link
Contributor

@fabi321 fabi321 commented Jun 15, 2023

Multiple parser improvements,both on the readability, as well as performance side.

The new SIMD hex parser has one major drawback: it is undefined behavior for invalid characters, instead of 0.

@fabi321
Copy link
Contributor Author

fabi321 commented Jun 15, 2023

The alpha blending could probably benefit from SIMD usage, too, but I'm kinda waiting for benches that include alpha for this

@sbernauer
Copy link
Owner

Really awesome, I like!

I think the UB is fine, when the user provides garbage values it's ok to color the pixel with some random value I guess.

I want to write some notes for README (e.g. that we need to use nightly rust) before merging this.
Had some trouble getting reproducible benchmark results, do you have speedup numbers from your machine?
I would also be interested in the emitted assembler code using cargo asm --rust breakwater::parser::simd_unhex, but it can't find the specified function besides all my effort of #[inline(never)] and calling simd_unhex directly from main.

@sbernauer
Copy link
Owner

Nice time-travel btw!
image

@fabi321
Copy link
Contributor Author

fabi321 commented Jun 15, 2023

As I mentioned in one of the commits, the speedup on my machine was, averaging across many runs, from 13.5ms to 11ms. My bench results have been pretty stable at 11.05±0.1ms

@fabi321
Copy link
Contributor Author

fabi321 commented Jun 15, 2023

I also think that UB is fine, that's why I used in the first place, it's just something that is different now due to the way how the conversion happens, as it is no longer a lookup, but rather a calculation that is only defined for 0123456789abcdefABCDEF

Copy link
Owner

@sbernauer sbernauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to merge this PR as-is, but have made some changes to the Rust toolchain and README here.
I would leave it up to you if you want pull in the 3 commits from https://github.com/sbernauer/breakwater/tree/parser-improvements, or if I should push them afterwards

@fabi321
Copy link
Contributor Author

fabi321 commented Jun 15, 2023

Regarding the assembler, this is the function as IDA disassebles it:

u32 __fastcall breakwater::parser::simd_unhex::h52042561b542debc(__u8_ value, __m128 _XMM0)
{
  u32 result; // eax
  core::option::Option<core::fmt::Arguments> args; // [rsp+0h] [rbp-38h] BYREF

  *(_QWORD *)args.gap0 = value.length;
  if ( value.length != 8 )
  {
    *(_QWORD *)&args.gap0[8] = 0LL;
    core::panicking::assert_failed::h676f2fb9f137bf56(Eq, (usize *)&args, (usize *)"\b", args);
  }
  __asm
  {
    vpmovzxbd ymm0, qword ptr [rdi]
    vpbroadcastd ymm2, cs:dword_2EE7B4
    vpsrld  ymm1, ymm0, 6
    vpmaddwd ymm1, ymm1, cs:ymmword_2EE7C0
    vpand   ymm0, ymm0, ymm2
    vpaddd  ymm0, ymm1, ymm0
    vpsllvd ymm0, ymm0, cs:ymmword_2EE7E0
    vextracti128 xmm1, ymm0, 1
    vpor    xmm0, xmm0, xmm1
    vpshufd xmm1, xmm0, 0EEh
    vpor    xmm0, xmm0, xmm1
    vpshufd xmm1, xmm0, 55h ; 'U'
    vpor    xmm0, xmm0, xmm1
    vmovd   eax, xmm0
    vzeroupper
  }
  return result;
}

@fabi321
Copy link
Contributor Author

fabi321 commented Jun 15, 2023

@sbernauer I pulled your commits, and it is ready to be merged.

@fabi321
Copy link
Contributor Author

fabi321 commented Jun 15, 2023

One note about parsing coordinates using simd: as they are variable length, it will be hard if not impossible to do. That was one of the reasons why I picked colors as my first target, as their length is known in the code path.

@sbernauer
Copy link
Owner

That were exactly my thoughts! Maybe we can use some bitmasks to determine the coordinate length and afterwards use specialized SIMD-instructions. Event cooler would be to parse both coordinates simultaneous, but we are getting ahead of ourselves ^^

@sbernauer
Copy link
Owner

Thanks for putting this up!

@sbernauer
Copy link
Owner

Have to come back this before merging to get the ci checks green:

  • Switch ci to use nightly rust.
  • Switch the docker build to not use cpu-specific features as it should run everywhere

Will try to get it done today

@fabi321
Copy link
Contributor Author

fabi321 commented Jun 16, 2023

I have played around with SIMD digit parsing now, in a separate branch (https://github.com/fabi321/breakwater/tree/simd-digit-parsing). However, it is significantly slower as of now. it takes about 20.5ms. I also tried to confuse the branch predictor a bit, shuffling the commands, but that puts the current solution at 18.75ms, still faster than my simd one. So either, I find a way to fix the performance issues, or it will have to stay like this.

@sbernauer
Copy link
Owner

Sounds really interesting, I think I will give it a try as well. I now also installed Linux on (~10 year old) Desktop to get more reliable benchmarks.
I would merge this PR now and fix the CI checks in main (easier to test) We can improve things in a new PR I would say

@sbernauer sbernauer merged commit 61a1654 into sbernauer:master Jun 16, 2023
2 of 7 checks passed
@sbernauer
Copy link
Owner

Btw, had a -8.2877% improvement (27.359ms vs 25.092ms) on my Desktop 👍 (which sadly only support avx, no avx2 or above)

@fabi321 fabi321 deleted the parser-improvements branch June 16, 2023 21:29
@fabi321 fabi321 restored the parser-improvements branch June 16, 2023 21:41
@fabi321 fabi321 deleted the parser-improvements branch June 20, 2023 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants