forked from gcc-mirror/gcc
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
aarch64: Use SVE ASRD instruction with Neon modes.
The ASRD instruction on SVE performs an arithmetic shift right by an immediate for divide. This patch enables the use of ASRD with Neon modes. For example: int in[N], out[N]; void foo (void) { for (int i = 0; i < N; i++) out[i] = in[i] / 4; } compiles to: ldr q31, [x1, x0] cmlt v30.16b, v31.16b, #0 and z30.b, z30.b, 3 add v30.16b, v30.16b, v31.16b sshr v30.16b, v30.16b, 2 str q30, [x0, x2] add x0, x0, 16 cmp x0, 1024 but can just be: ldp q30, q31, [x0], 32 asrd z31.b, p7/m, z31.b, #2 asrd z30.b, p7/m, z30.b, #2 stp q30, q31, [x1], 32 cmp x0, x2 This patch also adds the following overload: aarch64_ptrue_reg (machine_mode pred_mode, machine_mode data_mode) Depending on the data mode, the function returns a predicate with the appropriate bits set. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_ptrue_reg): New overload. * config/aarch64/aarch64-protos.h (aarch64_ptrue_reg): Likewise. * config/aarch64/aarch64-sve.md: Extended sdiv_pow2<mode>3 and *sdiv_pow2<mode>3 to support Neon modes. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/sve-asrd.c: New test. Co-authored-by: Richard Sandiford <[email protected]> Signed-off-by: Soumya AR <[email protected]>
- Loading branch information
1 parent
65b7c8d
commit e5569a2
Showing
4 changed files
with
115 additions
and
12 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
/* { dg-do compile } */ | ||
/* { dg-options "-Ofast --param aarch64-autovec-preference=asimd-only" } */ | ||
/* { dg-final { check-function-bodies "**" "" "" } } */ | ||
|
||
#include <stdint.h> | ||
|
||
#define FUNC(TYPE, I) \ | ||
TYPE M_##TYPE##_##I[I]; \ | ||
void asrd_##TYPE##_##I () \ | ||
{ \ | ||
for (int i = 0; i < I; i++) \ | ||
{ \ | ||
M_##TYPE##_##I[i] /= 4; \ | ||
} \ | ||
} | ||
|
||
/* | ||
** asrd_int8_t_8: | ||
** ... | ||
** ptrue (p[0-7]).b, vl8 | ||
** ... | ||
** asrd z[0-9]+\.b, \1/m, z[0-9]+\.b, #2 | ||
** ... | ||
*/ | ||
FUNC(int8_t, 8); | ||
|
||
/* | ||
** asrd_int8_t_16: | ||
** ... | ||
** ptrue (p[0-7]).b, vl16 | ||
** ... | ||
** asrd z[0-9]+\.b, \1/m, z[0-9]+\.b, #2 | ||
** ... | ||
*/ | ||
FUNC(int8_t, 16); | ||
|
||
/* | ||
** asrd_int16_t_4: | ||
** ... | ||
** ptrue (p[0-7]).b, vl8 | ||
** ... | ||
** asrd z[0-9]+\.h, \1/m, z[0-9]+\.h, #2 | ||
** ... | ||
*/ | ||
FUNC(int16_t, 4); | ||
|
||
/* | ||
** asrd_int16_t_8: | ||
** ... | ||
** ptrue (p[0-7]).b, vl16 | ||
** ... | ||
** asrd z[0-9]+\.h, \1/m, z[0-9]+\.h, #2 | ||
** ... | ||
*/ | ||
FUNC(int16_t, 8); | ||
|
||
/* | ||
** asrd_int32_t_2: | ||
** ... | ||
** ptrue (p[0-7]).b, vl8 | ||
** ... | ||
** asrd z[0-9]+\.s, \1/m, z[0-9]+\.s, #2 | ||
** ... | ||
*/ | ||
FUNC(int32_t, 2); | ||
|
||
/* | ||
** asrd_int32_t_4: | ||
** ... | ||
** ptrue (p[0-7]).b, vl16 | ||
** ... | ||
** asrd z[0-9]+\.s, \1/m, z[0-9]+\.s, #2 | ||
** ... | ||
*/ | ||
FUNC(int32_t, 4); | ||
|
||
/* | ||
** asrd_int64_t_2: | ||
** ... | ||
** ptrue (p[0-7]).b, vl16 | ||
** ... | ||
** asrd z[0-9]+\.d, \1/m, z[0-9]+\.d, #2 | ||
** ... | ||
*/ | ||
FUNC(int64_t, 2); | ||
|