Skip to content

Collective message failure with PSM2 #60

@patrick-legi

Description

@patrick-legi

Hi,
for several weeks I try to understand a problem (wrong behavior) with fortran MPI_ALLTOALLW calls. The problem only occur on a Debian supercomputer using this opa-psm2 library for it's omni-path architecture. I, and 2 OpenMPI developpers, have tested many other achitectures (intel or amd cpu, with ethernet, omni-path or infiniband network and running RedHat or Suse OS. The problem do not occur in any of these tests. More over, if on the debian computer I build OpenMPI using --without-psm2 flag the problem do not occur but omni-path performances are not reached.
I'm building OpenMPI 4.0.5 with gcc 6.3 or gcc 10.2 (same behavior)

Please find in attachement a really small test case showing the problem. If all runs fine it prints " Test pass!" else it shows the wrong values and calls mpi_abort().
To run this test:

  1. make
  2. mpirun -np 4 ./test_layout_array

Patrick

DEBUG.tar.gz

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions