Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA aware Jacobi examples fail using PGI #24

Closed
dkokron opened this issue Oct 31, 2019 · 1 comment
Closed

CUDA aware Jacobi examples fail using PGI #24

dkokron opened this issue Oct 31, 2019 · 1 comment

Comments

@dkokron
Copy link

dkokron commented Oct 31, 2019

I've been able to run the CUDA aware and CUDA normal Jacobi examples using hpcx-2.4.0 and HPE MPT (2.20r173) using the GNU-8.2.0 compilers. However, I get a segfault with the following trace when using pgi-19.5.

pgcc --version
pgcc 19.5-0 LLVM 64-bit target on x86-64 Linux -tp skylake

MPT: #1 0x00002aaaab8d7b96 in mpi_sgi_system (
MPT: #2 MPI_SGI_stacktraceback (
MPT: header=header@entry=0x7fffffffbd40 "MPT ERROR: Rank 1(g:1) received signal SIGSEGV(11).\n\tProcess ID: 41511, Host: r101i0n0, Program: /nobackupp16/swbuild/dkokron/cuda/bin/jacobi_cuda_normal_mpi\n\tMPT Version: HPE MPT 2.20 05/28/19 04:16"...) at sig.c:340
MPT: #3 0x00002aaaab8d7d92 in first_arriver_handler (signo=signo@entry=11,
MPT: stack_trace_sem=stack_trace_sem@entry=0x2aaaaf380080) at sig.c:489
MPT: #4 0x00002aaaab8d812b in slave_sig_handler (signo=11,
MPT: siginfo=, extra=) at sig.c:565
MPT: #5
MPT: #6 0x0000000000404053 in CallJacobiKernel ()
MPT: #7 0x00000000004038b2 in RunJacobi (cartComm=3, rank=1, size=2,
MPT: domSize=0x7fffffffd100, topIndex=0x7fffffffd0d8, neighbors=0x7fffffffd0e4,
MPT: useFastSwap=0, devBlocks=0x7fffffffd160, devSideEdges=0x7fffffffd150,
MPT: devHaloLines=0x7fffffffd140, hostSendLines=0x7fffffffd130,
MPT: hostRecvLines=0x7fffffffd120, devResidue=0x2aeaf0220000,
MPT: copyStream=0xa152e90, iterations=0x7fffffffd174,
MPT: avgTransferTime=0x7fffffffd178) at Host.c:470
MPT: #8 0x0000000000401d8e in main (argc=4, argv=0x7fffffffd298) at Jacobi.c:78

@jirikraus
Copy link
Member

Hi dkokron, thanks for your report. I am closing this issue, because I think this is not an issue with the example so the issue should be filed either with the MPI stack you are using or the PGI compiler. Feel free to reopen in case you disagree. To narrow down can you try with the CUDA-aware OpenMPI that comes with the PGI installation or with another CUDA-aware OpenMPI build? In my tests the code passes with PGI 19.05, CUDA 10.1 and the CUDA-aware OpenMPI build that comes with PGI (3.1.2). Thanks Jiri

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants