Feature/permute rebase13 #18

bjoo · 2020-05-04T18:11:13Z

Hi All, Here are the additions for vector lane permute. Intrinsics for AVX (Single Prec), AVX512 (single prec and double prec), Generic for other CPUs, __shfl_sync for CUDA, __shfl for HIP.

I also came accross an overflow possibility with the CUDA masks when N=32 (got compiler warning) so I made that go away.

A couple of FIXME's still left in there as well as a funny for AVX512 where the permutexvar intrinsic looks at every second element of the permutation index table register, and intevening elements are zeroed out, hence the funny 8 element constructor for the simd<int,> type. Maybe worth renaming these to something like simd_permute_control_t in the future.

This was rebased onto master after #13 was merged.

The testing harness is at: https://github.com/bjoo/simd-math-testing.git and shows how the permutes are being called.

…, and single prec permute in avx.h

Guarded permute in simd_common.h as it won't work for GPU architectures FIXME: need better guards for Clang + GPU targets Fixed: any_of all_of in HIP backend Fixed: potential mask overflow in cuda_warp

crtrott · 2020-05-04T19:39:38Z

I am wondering about the constructors with a subset of elements. Since this is specifically for permute vectors, maybe a non-member function to create a permute vector would be better? I would actually prefer an interface where the permute would take a permute integer simd type of the same length as the simd value type, but I take it from you that that is too expansive since you have to convert it internally?

We need to think about what kind of interface would be acceptable to the C++ standard.

bjoo · 2020-05-04T21:12:33Z

I can make a make_permute() function call. That is a nice way of solving that oddball constructor issue. I do actually use simd<int, simd_abi::whatever> for the control, primarily for efficiency on GPUs where you want to spend maybe one register per mask, not 32. So it is filled ‘via’ simd_storage on the host where you give the full vector length permute. But the actual mask is a simd<int,…> like you desire.

We need to think about what kind of interface would be acceptable to the C++ standard.

Indeed, I never actually dared to think that far ahead. Have a look here: for AVX512: https://github.com/bjoo/simd-math-testing/blob/master/test/avx512-tests/test_simd_avx512_permute.cpp and for e.g. CUDA: https://github.com/bjoo/simd-math-testing/blob/master/test/cuda-tests/test_simd_cuda_permute.cpp I’ll go add the make_permute() function… Best, B

…

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

----------------------------------------------------------------------------- Dr Balint Joo High Performance Computational Scientist Jefferson Lab 12000 Jefferson Ave Suite 3, MS 12B2, Room F217 Newport News VA 23606, USA Tel: +1-757-269-5339 email: bjoo AT jlab.org -----------------------------------------------------------------------------

…its class

…X512 Removed weird simd<int,> initialization for AVX512 -- essentially the zeros get put into a simd_t::storage_type by make_permute now.

bjoo · 2020-05-04T23:23:35Z

Hi, as suggested by @crtrott I have modified the interface by adding make_permute.
I have also updated the test codes to see how it could be used.

Actually another thing, now that this is kinda traits-y so we have

simd::simd_utils< simd<T,Abi>::make_permute(int mask[ simd<T,Abi>::size() ] )

there is nothing in principle to make the 'int mask' a compile time template parameter i.e:

simd::simd_utils<simd<T,Abi>::template make_permute<from0,from1,....fromN>(void)

where from0 is what would otherwise be mask[0],
from1 is what would otherwise be mask[1]
etc.

however for warp-sizes of 64, that could get unwieldy.

NB: SYCL has its own way of doing shuffles with its vec<T,N> type which is templatized like this, but i) is not a warp parallel permute, and ii) currently has a maximum size of 16. There the shuffle operator returns a shuffled_vec<T,N> type and the shuffle is not actually carried out until this is assigned to another vec<T,N>. That approach could also be a way to go towards standardization too, tho what I have now is simple, and relatively portable. For SSE it would be annoying to implement the shuffle, as one would need to use _mm_shuffle_ps/pd where the masks are immediates :( So probably we'd need to do something gross like 'make_permute'
returns the immediate, and then the permute would need some horrible switch statement with something like

switch( control_value ) {
case 0 : _mm_shuffle_ps(reg1, reg1, 0x0; break;
case 1 : _mm_shuffle_ps(reg1, reg1, 0x01; break;
...
};

This gets pretty tedious to write. Ditto for AVX2 Double Prec shuffles
(_mm256_permutevar_pd is in AVX512 :( )

Let me know if you need anything else added to this.

ibaned · 2020-08-13T19:53:42Z

@crtrott @bjoo sorry for not seeing this earlier. @crtrott did @bjoo sufficiently address your feedback? If so I can look at merging.

Balint Joo and others added 3 commits May 4, 2020 12:46

Added general permute in simd_common.hpp. Added integer simd in avx.h…

aeb2eee

…, and single prec permute in avx.h

AVX512 int simd, int simd mask and permutes added

0b390a9

Added Permutes for CUDA and HIP

99ad380

Guarded permute in simd_common.h as it won't work for GPU architectures FIXME: need better guards for Clang + GPU targets Fixed: any_of all_of in HIP backend Fixed: potential mask overflow in cuda_warp

Balint Joo and others added 2 commits May 4, 2020 17:31

Added static make_permute() function in simd_utils< simd<T,Abi> > tra…

7c08800

…its class

Added simd_utils< simd<T,Abi>::make_permute() function for AVX and AV…

9440b03

…X512 Removed weird simd<int,> initialization for AVX512 -- essentially the zeros get put into a simd_t::storage_type by make_permute now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/permute rebase13 #18

Feature/permute rebase13 #18

bjoo commented May 4, 2020

crtrott commented May 4, 2020

bjoo commented May 4, 2020 via email

bjoo commented May 4, 2020

ibaned commented Aug 13, 2020

Feature/permute rebase13 #18

Are you sure you want to change the base?

Feature/permute rebase13 #18

Conversation

bjoo commented May 4, 2020

crtrott commented May 4, 2020

bjoo commented May 4, 2020 via email

bjoo commented May 4, 2020

ibaned commented Aug 13, 2020