Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not pass -m flags when compiling shuffle.c #622

Merged
merged 8 commits into from
Jul 4, 2024

Conversation

mgorny
Copy link
Contributor

@mgorny mgorny commented Jun 25, 2024

Do not pass -msse2, -mavx2, etc. flags to the compiler when compiling shuffle.c. From what I can see, the file itself does not use any of these intrinsics, and they are only used by functions declared in bitshuffle-*.c and shuffle-*.c (where the respective flags are still passed). This prevents the compiler from incidentally optimizing the code called independenlty of the runtime CPU check to these instruction sets, effectively causing SIGILL on other CPUs.

I have verified that this fixes the issue on -march=znver2, but also does not cause any issues on -march=x86-64 and -march=i686.

Fixes #621

@FrancescAlted
Copy link
Member

Thanks @mgorny for this. However, your pull request will not use AVX2 in machines having it (you can easily check that by putting a printf() in https://github.com/Blosc/c-blosc2/blob/main/blosc/shuffle.c#L312). This is because how the dynamic dispatching has been implemented inside blosc2; maybe there is a better way for implementing the dynamic dispatching, but again, your PR, as-is, is effectively disabling the use of AVX2.

Unfortunately AVX2 can greatly accelerate shuffle/unshuffle as well as bitshuffle/bitunshuffle, and it would be bad if users cannot leverage this capability anymore.

Also, @t20100 may have something to say here.

@mgorny
Copy link
Contributor Author

mgorny commented Jun 25, 2024

Oh, I'm sorry about that. I see the problem now, the SHUFFLE_USE* macros are the problem. Apparently they were added in dd57c03, but the commit message doesn't really give an example of the problem they solved. In fact, they explicitly contradict CMakeLists.txt:

# Define a symbol for the shuffle-dispatch implementation
# so it knows AVX2 is supported even though that file is
# compiled without AVX2 support (for portability).

FWICS top-level CMakeLists.txt has a pretty specific list of compilers supporting AVX2, etc., so I'm surprised they would need to be double-checked like that. Of course, it would be much cleaner to perform a compile check rather than hardcode a list of compilers.

Do not pass `-msse2`, `-mavx2`, etc. flags to the compiler when
compiling `shuffle.c`.  From what I can see, the file itself does not
use any of these intrinsics, and they are only used by functions
declared in `bitshuffle-*.c` and `shuffle-*.c` (where the respective
flags are still passed).  This prevents the compiler from incidentally
optimizing the code called independenlty of the runtime CPU check to
these instruction sets, effectively causing `SIGILL` on other CPUs.

I have verified that this fixes the issue on `-march=znver2`, but also
does not cause any issues on `-march=x86-64` and `-march=i686`.

Fixes Blosc#621
@mgorny
Copy link
Contributor Author

mgorny commented Jun 25, 2024

I've updated the pull request to fix the immediate issue. I can work on it further when I know what's the issue that prompted dd57c03.

@t20100
Copy link
Contributor

t20100 commented Jun 25, 2024

This commit (dd57c03) was to add support of macos universal2 builds (#431, #430). The issue in this case is that one compilation command builds both ARM and x86, so source files cannot be added conditionally to the build, nor macro defined per architecture.

On option proposed in Blosc/c-blosc#347 (Same as #431 but for c-blosc1):

An alternative could be to define blosc_internal_*_avx2 and blosc_internal_*_sse2 functions even when AVX2/SSE2 are not defined with a body calling e.g., abort().

With this shuffle.c can be built without SIMD options and all source files can be built on all architectures.

@FrancescAlted
Copy link
Member

Ok, the latest version of this PR is using AVX2 on my box, so that's great. @t20100 I am not 100% certain if you agree with this PR or if you foresee issues with universal2 builds. Can you clarify?

@t20100
Copy link
Contributor

t20100 commented Jun 25, 2024

As it is, I would expect universal2 build to no longer work and any difference between what the CMakefile probes and SIMD macros like:

#if defined(__AVX512F__) && defined (__AVX512BW__)

to lead to issues.
I'll propose a change for this to clarify.

@t20100
Copy link
Contributor

t20100 commented Jun 25, 2024

Hope this helps to clarify: t20100@b243896

It's only done for AVX2 and would need to be done for all SIMD implementations. It's maybe not necessary to do it for both shuffle and bitshuffle since the marco tests are the same.

It's basically adding a runtime test of the SIMD implementation availability and dummy functions when not available.
It still needs testing.

@mgorny
Copy link
Contributor Author

mgorny commented Jun 25, 2024

Should I add these to this PR or are you going to take it from here?

@t20100
Copy link
Contributor

t20100 commented Jun 25, 2024

Should I add these to this PR or are you going to take it from here?

As you want.
I first wanted to get feedbacks before going further.

If it's OK, then you can either get my changes or I can make a PR to your branch to add it to this PR.

@FrancescAlted
Copy link
Member

@t20100 In general I like were you are headed (I just suggested to use _AVX2 suffix). It would be nice if both of you can contribute in this same PR (I don't if github makes this difficult though).

@mgorny
Copy link
Contributor Author

mgorny commented Jun 25, 2024

Ok, I'll update the PR in a few minutes.

@mgorny
Copy link
Contributor Author

mgorny commented Jun 25, 2024

@t20100 In general I like were you are headed (I just suggested to use _AVX2 suffix). It would be nice if both of you can contribute in this same PR (I don't if github makes this difficult though).

FWICS all the existing functions have _AVX suffix. Should I rename them?

@mgorny
Copy link
Contributor Author

mgorny commented Jun 25, 2024

(same goes for SSE2 bits having SSE suffix)

@FrancescAlted
Copy link
Member

@t20100 In general I like were you are headed (I just suggested to use _AVX2 suffix). It would be nice if both of you can contribute in this same PR (I don't if github makes this difficult though).

FWICS all the existing functions have _AVX suffix. Should I rename them?

Oh, I forgot about this. Yeah, as they are internal functions, I think renaming is a good idea.

@FrancescAlted
Copy link
Member

(same goes for SSE2 bits having SSE suffix)

Let's do that here too.

@mgorny
Copy link
Contributor Author

mgorny commented Jun 25, 2024

One more quick question: I see that some header files (but not all) declare additional functions such as bshuf_shuffle_bit_eightelem_altivec() that aren't used elsewhere. Should I remove them from the headers while at it?

@FrancescAlted
Copy link
Member

One more quick question: I see that some header files (but not all) declare additional functions such as bshuf_shuffle_bit_eightelem_altivec() that aren't used elsewhere. Should I remove them from the headers while at it?

Hmm, I'd say it is probably safe to remove those functions from the header, but I am not completely sure. Perhaps @kif can shed some light here?

@FrancescAlted
Copy link
Member

One more quick question: I see that some header files (but not all) declare additional functions such as bshuf_shuffle_bit_eightelem_altivec() that aren't used elsewhere. Should I remove them from the headers while at it?

Hmm, I'd say it is probably safe to remove those functions from the header, but I am not completely sure. Perhaps @kif can shed some light here?

Thinking twice about this, let's focus in our problem at hand, and let the cleanup to happen in another PR.

@mgorny
Copy link
Contributor Author

mgorny commented Jun 25, 2024

Well, unfortunately this brings the second question: since these functions are technically present in the header, should I also stub them with abort()s? To be honest, that's some extra work that I don't really want to go through.

My rough idea for this PR would be to, in order:

  1. Remove unneeded function decls from headers.
  2. Rename functions for consistency.
  3. Add checks and stubs as requested by @t20100.
  4. And then, when everything's ready, remove the flags as this PR currently does.

@FrancescAlted
Copy link
Member

Well, unfortunately this brings the second question: since these functions are technically present in the header, should I also stub them with abort()s? To be honest, that's some extra work that I don't really want to go through.

My rough idea for this PR would be to, in order:

1. Remove unneeded function decls from headers.

2. Rename functions for consistency.

3. Add checks and stubs as requested by @t20100.

4. And then, when everything's ready, remove the flags as this PR currently does.

I see. I'm +1 on your plan then.

@mgorny
Copy link
Contributor Author

mgorny commented Jun 25, 2024

Actually, I was wrong. Some of these functions are used cross-file, e.g. AVX512 uses AVX2 and SSE2 functions.

@mgorny
Copy link
Contributor Author

mgorny commented Jun 25, 2024

@t20100, actually, could you do the part with guards? I'm entirely unfamiliar with this code, and I'm getting too easily distracted for this.

@t20100
Copy link
Contributor

t20100 commented Jun 26, 2024

OK, I'll take care of the guards part.

@t20100
Copy link
Contributor

t20100 commented Jul 3, 2024

I made a PR on this branch with stubs when SIMD are not available and runtime availability checks: mgorny#1

What do you think?

Add runtime checks for use of SIMD implementations and stubs functions if not available.
@mgorny
Copy link
Contributor Author

mgorny commented Jul 3, 2024

Thanks a lot! I've merged it here.

@FrancescAlted
Copy link
Member

Hi. I have tried this out, and it looks good to me. If there is no opposition, I'll merge this soon. Thanks @mgorny and @t20100 !

@FrancescAlted FrancescAlted merged commit f3aea9e into Blosc:main Jul 4, 2024
17 checks passed
@mgorny mgorny deleted the shuffle-flags branch July 4, 2024 15:56
@mgorny
Copy link
Contributor Author

mgorny commented Jul 4, 2024

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

shuffle.c is compiled with -mavx512* which results in SIGILL on lower CPUs
3 participants