You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[simd_neon]refine the S_to_bits to improve "find_first/last_set" for neon
find_first_set and find_last_set method is not optimal for neon,
it need to be improved by synthesized with horizontal adds(vaddv)
which will reduce the generated assembly code; in the following cases,
vaddvq_s16 will generate 2 instructions but vpadd_s16 will generate 4
instrunctions:
```
# vaddvq_s16
vaddvq_s16(__asint);
// addv h0, v1.8h
// smov w1, v0.h[0]
# vpadd_s16
vpaddq_s16(vpaddq_s16(vpaddq_s16(__asint, __zero), __zero), __zero)[0]
// addp v1.8h,v1.8h,v2.8h
// addp v1.8h,v1.8h,v2.8h
// addp v1.8h,v1.8h,v2.8h
// smov w1, v1.h[0]
#
```
0 commit comments