You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been able to run the CUDA aware and CUDA normal Jacobi examples using hpcx-2.4.0 and HPE MPT (2.20r173) using the GNU-8.2.0 compilers. However, I get a segfault with the following trace when using pgi-19.5.
pgcc --version
pgcc 19.5-0 LLVM 64-bit target on x86-64 Linux -tp skylake
MPT: #1 0x00002aaaab8d7b96 in mpi_sgi_system (
MPT: #2 MPI_SGI_stacktraceback (
MPT: header=header@entry=0x7fffffffbd40 "MPT ERROR: Rank 1(g:1) received signal SIGSEGV(11).\n\tProcess ID: 41511, Host: r101i0n0, Program: /nobackupp16/swbuild/dkokron/cuda/bin/jacobi_cuda_normal_mpi\n\tMPT Version: HPE MPT 2.20 05/28/19 04:16"...) at sig.c:340
MPT: #3 0x00002aaaab8d7d92 in first_arriver_handler (signo=signo@entry=11,
MPT: stack_trace_sem=stack_trace_sem@entry=0x2aaaaf380080) at sig.c:489
MPT: #4 0x00002aaaab8d812b in slave_sig_handler (signo=11,
MPT: siginfo=, extra=) at sig.c:565
MPT: #5
MPT: #6 0x0000000000404053 in CallJacobiKernel ()
MPT: #7 0x00000000004038b2 in RunJacobi (cartComm=3, rank=1, size=2,
MPT: domSize=0x7fffffffd100, topIndex=0x7fffffffd0d8, neighbors=0x7fffffffd0e4,
MPT: useFastSwap=0, devBlocks=0x7fffffffd160, devSideEdges=0x7fffffffd150,
MPT: devHaloLines=0x7fffffffd140, hostSendLines=0x7fffffffd130,
MPT: hostRecvLines=0x7fffffffd120, devResidue=0x2aeaf0220000,
MPT: copyStream=0xa152e90, iterations=0x7fffffffd174,
MPT: avgTransferTime=0x7fffffffd178) at Host.c:470
MPT: #8 0x0000000000401d8e in main (argc=4, argv=0x7fffffffd298) at Jacobi.c:78
The text was updated successfully, but these errors were encountered:
Hi dkokron, thanks for your report. I am closing this issue, because I think this is not an issue with the example so the issue should be filed either with the MPI stack you are using or the PGI compiler. Feel free to reopen in case you disagree. To narrow down can you try with the CUDA-aware OpenMPI that comes with the PGI installation or with another CUDA-aware OpenMPI build? In my tests the code passes with PGI 19.05, CUDA 10.1 and the CUDA-aware OpenMPI build that comes with PGI (3.1.2). Thanks Jiri
I've been able to run the CUDA aware and CUDA normal Jacobi examples using hpcx-2.4.0 and HPE MPT (2.20r173) using the GNU-8.2.0 compilers. However, I get a segfault with the following trace when using pgi-19.5.
pgcc --version
pgcc 19.5-0 LLVM 64-bit target on x86-64 Linux -tp skylake
MPT: #1 0x00002aaaab8d7b96 in mpi_sgi_system (
MPT: #2 MPI_SGI_stacktraceback (
MPT: header=header@entry=0x7fffffffbd40 "MPT ERROR: Rank 1(g:1) received signal SIGSEGV(11).\n\tProcess ID: 41511, Host: r101i0n0, Program: /nobackupp16/swbuild/dkokron/cuda/bin/jacobi_cuda_normal_mpi\n\tMPT Version: HPE MPT 2.20 05/28/19 04:16"...) at sig.c:340
MPT: #3 0x00002aaaab8d7d92 in first_arriver_handler (signo=signo@entry=11,
MPT: stack_trace_sem=stack_trace_sem@entry=0x2aaaaf380080) at sig.c:489
MPT: #4 0x00002aaaab8d812b in slave_sig_handler (signo=11,
MPT: siginfo=, extra=) at sig.c:565
MPT: #5
MPT: #6 0x0000000000404053 in CallJacobiKernel ()
MPT: #7 0x00000000004038b2 in RunJacobi (cartComm=3, rank=1, size=2,
MPT: domSize=0x7fffffffd100, topIndex=0x7fffffffd0d8, neighbors=0x7fffffffd0e4,
MPT: useFastSwap=0, devBlocks=0x7fffffffd160, devSideEdges=0x7fffffffd150,
MPT: devHaloLines=0x7fffffffd140, hostSendLines=0x7fffffffd130,
MPT: hostRecvLines=0x7fffffffd120, devResidue=0x2aeaf0220000,
MPT: copyStream=0xa152e90, iterations=0x7fffffffd174,
MPT: avgTransferTime=0x7fffffffd178) at Host.c:470
MPT: #8 0x0000000000401d8e in main (argc=4, argv=0x7fffffffd298) at Jacobi.c:78
The text was updated successfully, but these errors were encountered: