op/aarch64: refactor SVE functions #12683

ggouaillardet · 2024-07-16T04:40:15Z

Refactor SVE functions and incidentally make NVIDIA compilers a happy panda again.

ggouaillardet · 2024-07-16T04:46:07Z

@bosilca this is a refactor of the SVE functions, to make them more lookalike the SVE VLA examples provided by Arm.

I was initially unable to compile the op/aarch64 component with NVIDIA compilers 24.5 because a compiler issue.
The compiler bug can be evidenced with the following code (built on a Grace CPU fwiw)

#include <arm_sve.h>
#include <unistd.h>

#if 1
void ompi_op_aarch64_2buff_max_uint8_t_sve (const void *_in, void *_out, int *count) {
        int types_per_step = svcntb();
        size_t idx = 0, left_over = *count;
        uint8_t *in = (uint8_t *) _in, *out = (uint8_t *) _out;
        svuint8_t vsrc, vdst;
        svbool_t pred = svwhilelt_b8(idx, left_over);
        do {
            vsrc = svld1(pred, &in[idx]);
            vdst = svld1(pred, &out[idx]);
            vdst = svmax_x(pred, vdst, vsrc);
            svst1(pred, &out[idx], vdst);
            idx += types_per_step;
            pred = svwhilelt_b8(idx, left_over);
        } while (svptest_any(svptrue_b8(), pred));
    }
#endif

#if 1
void ompi_op_aarch64_2buff_max_int8_t_sve (const void *_in, void *_out, int *count) {
        int types_per_step = svcntb();
        size_t idx = 0, left_over = *count;
        int8_t *in = (int8_t *) _in, *out = (int8_t *) _out;
        svint8_t vsrc, vdst;
        svbool_t pred = svwhilelt_b8(idx, left_over);
        do {
            vsrc = svld1(pred, &in[idx]);
            vdst = svld1(pred, &out[idx]);
            vdst = svmax_x(pred, vdst, vsrc);
            svst1(pred, &out[idx], vdst);
            idx += types_per_step;
            pred = svwhilelt_b8(idx, left_over);
        } while (svptest_any(svptrue_b8(), pred));
    }

#endif

and then

$ nvc --version

nvc 24.5-1 linuxarm64 target on aarch64 Linux -tp neoverse-v2 
NVIDIA Compilers and Tools
Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

$ nvc -c test.c
NVC++-S-0000-Internal compiler error. cngtyp problem      42  (test.c: 36)
NVC++/arm Linux 24.5-1: compilation completed with severe errors

FWIW, if the source file contains only one subroutine, compilation works fine, but if it contains two or more subroutines, then the compiler crashes.

@bosilca @jeffhammond can one of you please report this internally to the compiler team?

jeffhammond · 2024-07-16T05:07:31Z

I reported on Slack. Will escalate if necessary.

ggouaillardet · 2024-07-16T06:02:46Z

Thanks @jeffhammond !

ompi/mca/op/aarch64/op_aarch64_functions.c

bosilca

I don't see a fundamental difference with the original code and I really don't care much if it looks like the ARM documentation on SVE. But, if restructuring the code has no performance difference and makes another compiler happy, I'm all for it.

cparrott73 · 2024-07-16T16:39:26Z

@jeffhammond alerted me to this issue. I will open an internal bug report on it here today. Sorry for any inconvenience caused by this issue.

We have already frozen development for the 24.7 release, but I'll get this into the pipeline to be fixed as soon as possible subsequent to that.

Refactor SVE functions and incidentally make NVIDIA compilers a happy panda again. Signed-off-by: Gilles Gouaillardet <[email protected]>

ggouaillardet · 2024-07-17T06:51:01Z

Performance is basically on par (with some "interesting" variations")

LLVM 18.1.8

length	STOCK	REFACTOR
0	0.03	0.03
4	0.04	0.04
8	0.04	0.04
16	0.04	0.04
32	0.04	0.04
64	0.04	0.04
128	0.04	0.04
256	0.04	0.04
512	0.05	0.05
1024	0.06	0.06
2048	0.08	0.09
4096	0.14	0.14
8192	0.26	0.26
16384	0.48	0.48
32768	0.93	0.93
65536	1.83	1.83
131072	3.59	3.59
262144	7.17	7.16
524288	14.39	14.35
1048576	28.58	28.68
2097152	59.41	59.70
4194304	135.84	135.37

ARM 24.04

length	STOCK	REFACTOR
0	0.03	0.03
4	0.04	0.04
8	0.04	0.04
16	0.04	0.04
32	0.04	0.04
64	0.04	0.04
128	0.04	0.04
256	0.04	0.04
512	0.05	0.05
1024	0.06	0.06
2048	0.08	0.08
4096	0.13	0.13
8192	0.23	0.23
16384	0.43	0.43
32768	0.83	0.83
65536	1.66	1.66
131072	3.19	3.18
262144	6.34	6.33
524288	12.71	12.71
1048576	25.79	25.73
2097152	60.02	59.74
4194304	137.73	135.04

GCC 14.1.0

length	STOCK	REFACTOR
0	0.03	0.03
4	0.04	0.04
8	0.04	0.04
16	0.04	0.04
32	0.04	0.04
64	0.04	0.04
128	0.04	0.04
256	0.04	0.04
512	0.04	0.04
1024	0.05	0.05
2048	0.07	0.07
4096	0.11	0.11
8192	0.20	0.20
16384	0.35	0.35
32768	0.67	0.67
65536	1.63	1.63
131072	3.12	3.12
262144	6.25	6.25
524288	12.57	12.60
1048576	25.56	25.50
2097152	59.33	59.92
4194304	134.40	136.26

cparrott73 · 2024-07-25T21:18:18Z

Just to update - our developer has identified the issue and we are testing a fix in our development branch. This fix seems to have resolved the above issue with SVE intrinsics. We believe it will land in the HPC SDK 24.9 release, but we will keep you posted. (Ping me if I forget to check in here after that.)

ggouaillardet added the enhancement label Jul 16, 2024

ggouaillardet requested a review from bosilca July 16, 2024 04:40

github-actions bot added the Target: main label Jul 16, 2024

devreal reviewed Jul 16, 2024

View reviewed changes

ompi/mca/op/aarch64/op_aarch64_functions.c Outdated Show resolved Hide resolved

bosilca reviewed Jul 16, 2024

View reviewed changes

op/aarch64: refactor SVE functions

ba59533

Refactor SVE functions and incidentally make NVIDIA compilers a happy panda again. Signed-off-by: Gilles Gouaillardet <[email protected]>

ggouaillardet force-pushed the topic/op_aarch64_sve_refactor branch from f4fc249 to ba59533 Compare July 17, 2024 01:47

devreal approved these changes Jul 17, 2024

View reviewed changes

ggouaillardet marked this pull request as ready for review July 17, 2024 06:52

bosilca approved these changes Jul 17, 2024

View reviewed changes

bosilca merged commit 069a8c4 into open-mpi:main Jul 17, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

op/aarch64: refactor SVE functions #12683

op/aarch64: refactor SVE functions #12683

ggouaillardet commented Jul 16, 2024

ggouaillardet commented Jul 16, 2024

jeffhammond commented Jul 16, 2024

ggouaillardet commented Jul 16, 2024

bosilca left a comment

cparrott73 commented Jul 16, 2024

ggouaillardet commented Jul 17, 2024

cparrott73 commented Jul 25, 2024

op/aarch64: refactor SVE functions #12683

op/aarch64: refactor SVE functions #12683

Conversation

ggouaillardet commented Jul 16, 2024

ggouaillardet commented Jul 16, 2024

jeffhammond commented Jul 16, 2024

ggouaillardet commented Jul 16, 2024

bosilca left a comment

Choose a reason for hiding this comment

cparrott73 commented Jul 16, 2024

ggouaillardet commented Jul 17, 2024

cparrott73 commented Jul 25, 2024