WIP: clang build fixes and workarounds #16

iburyl · 2020-05-07T14:00:40Z

This is an initial list of changes needed to make it build-able by clang.
The list of fixes is not full and PR is here to make a notification about early findings, which may be useful anyway.

iburyl · 2020-05-07T14:35:57Z

experimental/bits/simd_x86.h

 		const auto __d = _mm512_cvtepi8_epi32(__a);
 		return _mm512_test_epi32_mask(__b, __b)
 		       | (_mm512_test_epi32_mask(__c, __c) << 16)
 		       | (_ULLong(_mm512_test_epi32_mask(__d, __d)) << 32);
 	      }
 	    else
 	      {
-		__builtin_memcpy(&__a, __mem + 16, 32);
+		__builtin_memcpy(&__a, static_cast<const char*>(__mem) + 16, 32);


my compiler flags this operation to always overflow, since __m128i __a is 16 bytes thing

This looks broken, yes. I'll take a look.

Yes, this needs to be 16. Seems I have no test coverage for this case 😉

There is already some value from clang then :)

mattkretz · 2020-05-07T14:39:15Z

experimental/bits/simd.h

@@ -178,7 +184,7 @@ using __value_type_or_identity_t
 // }}}
 // __is_vectorizable {{{
 template <typename _Tp>
-struct __is_vectorizable : public std::is_arithmetic<_Tp>
+struct __is_vectorizable : public std::is_arithmetic<std::remove_reference_t<_Tp>>


why is this needed? _Tp should never be a reference. And references are not vectorizable. (Pointers might be - needs a proposal)

clang by some reason returns long long&& out of operator[] in my example:
https://godbolt.org/z/WJN7_M

Then whatever calls __is_vectorizable<decltype(x[0])> is incorrect.

Yes, thus a number of remove_references will be required to workaround that thing across other places.

iburyl · 2020-05-07T14:39:59Z

My hello world example, which I use to make initial build possible have only 3 remaining errors:

In file included from /std-simd/experimental/simd:59:
/std-simd/experimental/bits/simd_builtin.h:1144:9: error: incomplete type 'std::experimental::parallelism_v2::_CommonImplX86' named in nested name specifier
        return __make_dependent_t<_TV, _CommonImpl>::_S_blend(
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/std-simd/experimental/bits/simd_builtin.h:953:8: note: forward declaration of 'std::experimental::parallelism_v2::_CommonImplX86'
struct _CommonImplX86;
       ^
/std-simd/experimental/bits/simd_builtin.h:1171:11: error: incomplete type 'std::experimental::parallelism_v2::_CommonImplX86' named in nested name specifier
          return __make_dependent_t<_TV, _CommonImpl>::_S_blend(
                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/std-simd/experimental/bits/simd_builtin.h:953:8: note: forward declaration of 'std::experimental::parallelism_v2::_CommonImplX86'
struct _CommonImplX86;
       ^
In file included from my_test.cpp:16:
In file included from /std-simd/experimental/simd:62:
/std-simd/experimental/bits/simd_x86.h:710:4: error: incomplete type 'std::experimental::parallelism_v2::_MaskImplX86Mixin' named in nested name specifier
                        __make_dependent_t<_Tp, _MaskImplX86Mixin>::__to_bits(
                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/std-simd/experimental/bits/simd_x86.h:347:8: note: forward declaration of 'std::experimental::parallelism_v2::_MaskImplX86Mixin'
struct _MaskImplX86Mixin;
       ^
/std-simd/experimental/bits/simd_x86.h:4250:3: warning: 'memcpy' will always overflow; destination buffer has size 16, but size argument is 32 [-Wfortify-source]
                __builtin_memcpy(&__a, static_cast<const char*>(__mem) + 16, 32);
                ^
1 warning and 3 errors generated.

Are there ideas on how to address that?

mattkretz · 2020-05-07T14:40:41Z

experimental/bits/simd.h

@@ -1285,7 +1291,7 @@ struct __vector_type_n<_Tp, _Np,
  static constexpr size_t _Bytes = _Np * sizeof(_Tp) < __min_vector_size<_Tp>
 				     ? __min_vector_size<_Tp>
 				     : __next_power_of_2(_Np * sizeof(_Tp));
-  using type [[__gnu__::__vector_size__(_Bytes)]] = _Tp;
+  using type [[__gnu__::__vector_size__(_Bytes)]] = std::remove_reference_t<_Tp>;


again, _Tp should never be a reference

mattkretz · 2020-05-07T14:42:09Z

experimental/bits/simd.h

@@ -3559,8 +3565,7 @@ split(const simd_mask<typename _V::simd_type::value_type, _Ap>& __x)

 // }}}
 // split<_Sizes...>(simd) {{{
-template <size_t... _Sizes, typename _Tp, typename _Ap,
-	  typename = enable_if_t<((_Sizes + ...) == simd<_Tp, _Ap>::size())>>
+template <size_t... _Sizes, typename _Tp, typename _Ap, typename>


removing the SFINAE condition breaks the spec

SFINAE is there in declaration. Here it is definition. clang says it is redefinition of default argument

Declaration is at line 3314

OK thanks. I'll take a look

mattkretz · 2020-05-07T14:42:59Z

experimental/bits/simd_builtin.h

@@ -624,7 +624,7 @@ __convert_all(_From __v)
 	    return __vector_bitcast<_FromT, decltype(__n)::value>(__vv);
 	  };
 	  [[maybe_unused]] const auto __vi = __to_intrin(__v);
-	  auto&& __make_array = [](std::initializer_list<auto> __xs) {
+	  auto&& __make_array = [](auto __xs) {


This can't work. An initializer list argument cannot be deduced as one. But I have a fix coming up for this. Erich mentioned it yesterday.

Yes, we talked with Erich yesterday about that.

mattkretz · 2020-05-07T15:12:56Z

My hello world example, which I use to make initial build possible have only 3 remaining errors:

__make_dependent_t was a clever hack to make the second template argument a dependent type in this context. Maybe I should find a better fix. But that probably requires a lot of code motion and less architecture abstraction into different files. So I'd like to avoid that. Let me find out whether clang is actually correct to reject this code.

mattkretz · 2020-05-07T15:25:35Z

https://godbolt.org/z/NeTaNs apparently only Clang rejects this pattern. That doesn't prove Clang is wrong. Not sure where to start in the standard 😉

mattkretz · 2020-05-07T15:30:11Z

Try this:

@@ -405,7 +405,12 @@ __is_neon_abi()                                             
⸱                                                                                
 // }}}                                                                          
 // __make_dependent_t {{{                                                       
-template <typename, typename _Up> using __make_dependent_t = _Up;               
+template <typename, typename _Up> struct __make_dependent                       
+{                                                                               
+  using type = _Up;                                                             
+};                                                                              
+template <typename _Tp, typename _Up>                                           
+using __make_dependent_t = typename __make_dependent<_Tp, _Up>::type;           
⸱                                                                                
 // }}}                                                                          
 // ^^^ ---- type traits ---- ^^^

iburyl · 2020-05-07T15:40:52Z

Try this:

@@ -405,7 +405,12 @@ __is_neon_abi()                                             
⸱                                                                                
 // }}}                                                                          
 // __make_dependent_t {{{                                                       
-template <typename, typename _Up> using __make_dependent_t = _Up;               
+template <typename, typename _Up> struct __make_dependent                       
+{                                                                               
+  using type = _Up;                                                             
+};                                                                              
+template <typename _Tp, typename _Up>                                           
+using __make_dependent_t = typename __make_dependent<_Tp, _Up>::type;           
⸱                                                                                
 // }}}                                                                          
 // ^^^ ---- type traits ---- ^^^

That made this thing compile:

#include <experimental/simd>
int main()
{
  std::experimental::fixed_size_simd<int, 5> a;
  std::experimental::fixed_size_simd<float, 32> b;

  return 0;
}

iburyl · 2020-05-08T12:01:33Z

Basic arithmetic ops look working.

Next broken thing is here:

// _S_signmask, _S_absmask{{{
template <typename _V, typename = _VectorTraits<_V>>
static inline constexpr _V _S_signmask = __xor(_V() + 1, _V() - 1);

clang does not consider it a constant expression
https://godbolt.org/z/RpWrqw

mattkretz · 2020-05-08T12:09:54Z

clang does not consider it a constant expression https://godbolt.org/z/RpWrqw

That's an unfortunate restriction in Clang. The better solution would be if Clang were to support constant expressions involving [[gnu::vector_size(N)]] objects. (Note that I want to propose constexpr simd for the C++23 inclusion [1].) Until that happens we should define a macro to turn this kind of constexpr into a const. [1] mattkretz/std-simd-feedback#14

iburyl · 2020-05-08T15:24:06Z

stdx::abs is now working for integers, but is returning zeros for FP for unknown reason.

stdx::fabs works as expected with FP.

stdx::sin has compilation issues

mattkretz · 2020-05-08T18:05:38Z

experimental/bits/simd_detail.h

+#if __clang__
+#define _GLIBCXX_CONSTEXPR_SIMD
+#else
+#define _GLIBCXX_CONSTEXPR_SIMD constexpr
+#endif


My naming convention has been to prefix everything with _GLIBCXX_SIMD_. I understand naming the macro "constexpr simd" is trying to explain the purpose of the macro. SIMD_CONSTEXPR_SIMD is just confusing again, though.
<bits/c++config> has _GLIBCXX_USE_CONSTEXPR which seems like a good fit here. I.e. name it _GLIBCXX_SIMD_USE_CONSTEXPR.
And define the __clang__ branch to const.

mattkretz · 2020-05-08T18:10:17Z

stdx::abs is now working for integers, but is returning zeros for FP for unknown reason.

Interesting. Maybe you can check whether the _S_absmask constants are correct?

stdx::fabs works as expected with FP.

This is, IIRC, still just a loop over std::fabs.

iburyl · 2020-05-08T20:05:04Z

_S_allbits and _S_signmask are fine, but _S_absmask is not

__andnot(_S_signmask<_V>, _S_allbits<_V>); // does not work, I have no idea why
auto a =_S_signmask<_V>;
auto b = _S_allbits<_V>;
__andnot(a, b); // does work

iburyl · 2020-05-08T20:28:54Z

Last unresolved error for stdx::sin() to compile:

/std-simd/experimental/bits/simd_builtin.h:1663:12: error: call to '__and' is ambiguous
    return __and(__x._M_data, __y._M_data);
           ^~~~~
/std-simd/experimental/bits/simd.h:4569:14: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::_SimdImplBuiltin<std::experimental::parallelism_v2::simd_abi::_VecBuiltin<8> >::__bit_and<int, 2>' requested here
      _Impl::__bit_and(__data(__x), __data(__y)));
             ^
/std-simd/experimental/bits/simd_math.h:576:48: note: in instantiation of member function 'std::experimental::parallelism_v2::operator&' requested here
      const auto __need_sin = (__f._M_quadrant & 1) == 0;
                                               ^
/std-simd/experimental/bits/simd_fixed_size.h:1632:37: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::sin<double, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16> >' requested here
  _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, sin)
                                    ^
/std-simd/experimental/bits/simd_math.h:558:46: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::_SimdImplFixedSize<4>::__sin<double, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16> >' requested here
    return {__private_init, _Abi::_SimdImpl::__sin(__data(__x))};
                                             ^
/std-simd/experimental/bits/simd_math.h:564:6: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::sin<double, std::experimental::parallelism_v2::simd_abi::_Fixed<4> >' requested here
            sin(static_simd_cast<rebind_simd_t<double, _V>>(__x)));
            ^
/std-simd/experimental/bits/simd_fixed_size.h:1632:37: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::sin<float, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16> >' requested here
  _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, sin)
                                    ^
/std-simd/experimental/bits/simd_math.h:558:46: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::_SimdImplFixedSize<32>::__sin<float, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16> >' requested here
    return {__private_init, _Abi::_SimdImpl::__sin(__data(__x))};
                                             ^
my_test.cpp:69:39: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::sin<float, std::experimental::parallelism_v2::simd_abi::_Fixed<32> >' requested here
  print_simd("stdx::sin( fa )", stdx::sin( fa ) );
                                      ^
/std-simd/experimental/bits/simd.h:1615:1: note: candidate function [with _Tp = __attribute__((__vector_size__(2 * sizeof(int)))) int, _TVT = std::experimental::parallelism_v2::_VectorTraitsImpl<__attribute__((__vector_size__(2 * sizeof(int)))) int, void>, _Dummy = <>]
__and(_Tp __a, typename _TVT::type __b, _Dummy...) noexcept
^
/std-simd/experimental/bits/simd.h:1626:1: note: candidate function [with _Tp = __attribute__((__vector_size__(2 * sizeof(int)))) int, $1 = __attribute__((__vector_size__(2 * sizeof(int)))) int]
__and(_Tp __a, _Tp __b) noexcept
^

mattkretz · 2020-05-08T21:26:06Z

_S_allbits and _S_signmask are fine, but _S_absmask is not

This smells like a compiler bug: https://godbolt.org/z/SYU0Fp

mattkretz · 2020-05-08T21:26:47Z

Last unresolved error for stdx::sin() to compile:

Interesting. Clang and GCC disagree on how overload resolution works here.

iburyl · 2020-05-13T11:04:45Z

Ok, my stdx::sin example works, and provides reasonable numbers.

Got a number of similar warnings though:

In file included from /std-simd/experimental/simd:62:
/std-simd/experimental/bits/simd_x86.h:2452:25: warning: '__builtin_is_constant_evaluated' will always evaluate to 'true' in a manifestly constant-evaluated expresnstant-evaluated]
    else if constexpr (!__builtin_is_constant_evaluated() && sizeof(__x) == 8) // {{{
                        ^

iburyl · 2020-05-13T14:18:08Z

Tried to compile tests.
Running via regular make quickly breaks with errors, thus I've applied direct approach to see wider picture for time-being.

for testfile in *.cpp; do
    $CXX -std=c++17 -Ivirtest -D__remove_cvref_t=std::remove_cvref_t \
        -Wno-everything \
        -D_GLIBCXX_SIMD_ABI=__sse $testfile
done

Those tests compiled:

abi.cpp
abi_fixed_size.cpp
bitset_conversions.cpp
mask.cpp
mask_conversions.cpp
proposed.cpp
sfinae.cpp
simd.cpp
specialmath.cpp
uninit.cpp

Those tests require __make_array workaround:

where.cpp

Internal compiler error

casts.cpp - internal compiler error
integer_operators.cpp - internal compiler error
loadstore.cpp - internal compiler error
math.cpp - internal compiler error

Other reasons:

experimental.cpp - no member named 'apply' in 'where_expression' (tests compilation issue?)
operators.cpp - some issues with __vector_bitcast<_Tp>(!__x._M_data);

2020.05.15: Updated based on __andnot fix
2020.05.19: Updated based on __make_array workaround and long double issue resolved

mattkretz · 2020-05-14T06:47:08Z

BTW, just so you're aware: I'm currently working on integrating this repo into libstdc++ and the relevant repo would be mattkretz/gcc with the mkretz/simd2 branch. However, I regularly force-push into that branch, so don't switch to working there.
But at some point I'll make a patch drop back into this repo. It's fine if we continue working like this. I just want to let you know that I probably won't be able to merge this PR as is, since I'll obviously try to get as many bug fixes as possible into the GCC branch ASAP.

iburyl · 2020-05-14T11:40:55Z

Then I'll continue trying to build it here.
But I need some insightful help.

Here is an error I get from tests

./../experimental/bits/simd_builtin.h:2724:14: error: no matching function for call to '__andnot'
      return __andnot(__x._M_data, _Abi::template __implicit_mask<_Tp>());
             ^~~~~~~~
...
./../experimental/bits/simd.h:1743:1: note: candidate template ignored: deduced conflicting types for parameter '_Tp' ('__attribute__((__vector_size__(4 * sizeof(int)))) int' (vector of 4 'int' values) vs. 'std::experimental::parallelism_v2::_SimdWrapper<int, 3, void>')
__andnot(_Tp __a, _Tp __b) noexcept
^
./../experimental/bits/simd.h:1717:1: note: candidate template ignored: requirement '!integral_constant<bool, true>::value' was not satisfied [with _Tp = __attribute__((__vector_size__(4 * sizeof(int)))) int, _TVT = std::experimental::parallelism_v2::_VectorTraitsImpl<__attribute__((__vector_size__(4 * sizeof(int)))) int, void>]
__andnot(_Tp __a, typename _TVT::type __b) noexcept
^

I have reproduced it with the following example:

  using fixed_i32_3 = stdx::simd<int, stdx::simd_abi::_VecBuiltin<12>>;
  fixed_i32_3 i3 = 1;
  where(!(i3 > 0), i3) = 2;

Update: resolved.

iburyl · 2020-05-15T14:00:45Z

I am observing strange behavior with long double so far.

Here ABI is deduced as scalar:

void foo(stdx::simd_mask<long double> a);

But that thing is deduced to _VecBuiltin(16) and it fails :

void foo() {
    stdx::simd_mask<long double>();
}

Looks like that:

my_test_2.cpp:19:10: error: call to implicitly-deleted default constructor of 'stdx::simd_mask<long double>'
    stdx::simd_mask<long double>();
         ^
/std-simd/experimental/bits/simd.h:4061:3: note: explicitly defaulted function was implicitly deleted here
  simd_mask() = default;
  ^
/std-simd/experimental/bits/simd.h:4038:19: note: default constructor of 'simd_mask<long double, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>>' is implicitly deleted because base class '_SimdTraits<long double, _VecBuiltin<16>>::_MaskBase' (aka 'std::experimental::parallelism_v2::_UnsupportedBase') has a deleted default constructor
class simd_mask : public _SimdTraits<_Tp, _Abi>::_MaskBase
                  ^
/std-simd/experimental/bits/simd.h:546:3: note: '_UnsupportedBase' has been explicitly marked deleted here
  _UnsupportedBase() = delete;
  ^
1 error generated.

mattkretz · 2020-05-15T14:33:34Z

I am observing strange behavior with long double so far.

The error you see from constructing an object of type simd<long double, _VecBuiltin<16>> is expected. That's the result from using an unsupported ABI tag. The question is really why simd_abi::compatible<long double> would not be scalar.
https://github.com/VcDevel/std-simd/blob/master/experimental/bits/simd.h#L2304-L2307 would choose _VecBuiltin<16> if sizeof(long double) <= 8. That'd be strange for Linux.

iburyl · 2020-05-16T19:56:59Z

At the moment I see a weirdest behavior. In the same compilation unit sizeof(long double) is some times evaluated to 8 and some times to 16, and this mess destroys the logic for long double. Checking with Erich, if it is a known compiler bug.

iburyl · 2020-05-18T14:59:42Z

Ok, long double issue turned to be a problem of my custom compiler build. Resolved that on my side.

iburyl · 2020-05-18T15:21:32Z

@mattkretz, do you happen to think of __make_array workaround?

iburyl · 2020-05-19T09:08:41Z

Next thing I have, is absence of unary not operator in clang.
https://godbolt.org/z/r6eeZr

This fires in operators test.
Goes down to: simd_builtin.h

  template <typename _Tp, size_t _Np>
  _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp>
  __negate(_SimdWrapper<_Tp, _Np> __x) noexcept
  {
    return __vector_bitcast<_Tp>(!__x._M_data);
  }

iburyl · 2020-05-20T08:44:58Z

I feel like I've fixed/workarounded all known compilation problems up to Haswell.
Skylake have some missing builtins, e.g. __builtin_ia32_blendmb_512_mask

Now I need to run tests.

Direct compilation of tests does not work well - compiler fails on instantiation of huge test template functions: virtest/vir/test.h:1000:12: instantiating function definition {and here is a function declarartion 200KB long}

Is there a way to run the test system only for one instruction set, e.g. SSE and/or one given data type e.g. double?

mattkretz · 2020-05-25T08:12:02Z

Sorry, I dropped out for a few days because of other work + a public holiday here. First thing I'll do is to get my gcc branch merged back here for convenience. That'll be a patch drop, but that should help do duplicate less work.

iburylov added 2 commits May 7, 2020 16:52

initial clang build fixes and workarounds

a364375

second set of fixes

f2d541e

iburyl commented May 7, 2020

View reviewed changes

mattkretz reviewed May 7, 2020

View reviewed changes

__make_dependent_t fix

166d341

constexpr workaround

497120c

mattkretz reviewed May 8, 2020

View reviewed changes

workaround for _S_absmask issue

959de34

make stdx::sin work for float

d50057f

iburylov added 2 commits May 13, 2020 16:47

fixed using nullptr_t in sfinae test

4b22866

ambiguity refactoring

2faf842

iburylov added 3 commits May 15, 2020 11:49

align ambiguity fix with simd2 branch in gcc

62e261f

fix clang check

f27600d

one more clang check fix

6a484db

__make_array workaround

c594c4a

workaround for unary not

dd6f6a2

WIP: clang build fixes and workarounds #16

Are you sure you want to change the base?

WIP: clang build fixes and workarounds #16

Conversation

iburyl commented May 7, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iburyl commented May 7, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattkretz commented May 7, 2020

mattkretz commented May 7, 2020

mattkretz commented May 7, 2020

iburyl commented May 7, 2020

iburyl commented May 8, 2020

mattkretz commented May 8, 2020 via email

iburyl commented May 8, 2020

Choose a reason for hiding this comment

mattkretz commented May 8, 2020

iburyl commented May 8, 2020

iburyl commented May 8, 2020

mattkretz commented May 8, 2020

mattkretz commented May 8, 2020

iburyl commented May 13, 2020 • edited Loading

iburyl commented May 13, 2020 • edited Loading

mattkretz commented May 14, 2020

iburyl commented May 14, 2020 • edited Loading

iburyl commented May 15, 2020

mattkretz commented May 15, 2020

iburyl commented May 16, 2020

iburyl commented May 18, 2020

iburyl commented May 18, 2020

iburyl commented May 19, 2020

iburyl commented May 20, 2020

mattkretz commented May 25, 2020

iburyl commented May 13, 2020 •

edited

Loading

iburyl commented May 13, 2020 •

edited

Loading

iburyl commented May 14, 2020 •

edited

Loading