Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenMPI/4.0.5-GCC-10.2.0 foss2020b mpirun error #755

Open
connorourke opened this issue Nov 3, 2021 · 1 comment
Open

OpenMPI/4.0.5-GCC-10.2.0 foss2020b mpirun error #755

connorourke opened this issue Nov 3, 2021 · 1 comment
Milestone

Comments

@connorourke
Copy link

I am getting the following error when building FFTW/3.3.8/gompi-2020b with the foss-2020b toolchain.

Executing "mpirun -np 1 /scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/build/FFTW/3.3.8/gompi-2020b/fftw-3.3.8/mpi/mpi-bench --verbose=1   --verify 'obcd52x48' --verify 'ibcd52x48' --verify 'ofcd52x48' --verify 'ifcd52x48' --verify 'obrd[11x2x4x12' --verify 'ibrd[11x2x4x12' --verify 'obcd[11x2x4x12' --verify 'ibcd[11x2x4x12' --verify 'ofcd[11x2x4x12' --verify 'ifcd[11x2x4x12' --verify 'obr[3x13v22' --verify 'ibr[3x13v22' --verify 'obc[3x13v22' --verify 'ibc[3x13v22' --verify 'ofc[3x13v22' --verify 'ifc[3x13v22' --verify 'okd]12o11x7e11x2o00' --verify 'ikd]12o11x7e11x2o00' --verify 'obr4x9x2' --verify 'ibr4x9x2' --verify 'ofr4x9x2' --verify 'ifr4x9x2' --verify 'obc4x9x2' --verify 'ibc4x9x2' --verify 'ofc4x9x2' --verify 'ifc4x9x2' --verify 'ok[6e10x7o01' --verify 'ik[6e10x7o01' --verify 'obr6x9x8x10' --verify 'ibr6x9x8x10' --verify 'ofr6x9x8x10' --verify 'ifr6x9x8x10' --verify 'obc6x9x8x10' --verify 'ibc6x9x8x10' --verify 'ofc6x9x8x10' --verify 'ifc6x9x8x10' --verify 'okd6e10x5o01x5e11x7o01' --verify 'ikd6e10x5o01x5e11x7o01' --verify 'ofr]5x20v4' --verify 'ifr]5x20v4' --verify 'obc]5x20v4' --verify 'ibc]5x20v4' --verify 'ofc]5x20v4' --verify 'ifc]5x20v4'"
--------------------------------------------------------------------------
An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. Please check network connectivity
(including firewalls and network routing requirements).
--------------------------------------------------------------------------
FAILED mpirun -np 1 /scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/build/FFTW/3.3.8/gompi-2020b/fftw-3.3.8/mpi/mpi-bench:  --verify 'obcd52x48' --verify 'ibcd52x48' --verify 'ofcd52x48' --verify 'ifcd52x48' --verify 'obrd[11x2x4x12' --verify 'ibrd[11x2x4x12' --verify 'obcd[11x2x4x12' --verify 'ibcd[11x2x4x12' --verify 'ofcd[11x2x4x12' --verify 'ifcd[11x2x4x12' --verify 'obr[3x13v22' --verify 'ibr[3x13v22' --verify 'obc[3x13v22' --verify 'ibc[3x13v22' --verify 'ofc[3x13v22' --verify 'ifc[3x13v22' --verify 'okd]12o11x7e11x2o00' --verify 'ikd]12o11x7e11x2o00' --verify 'obr4x9x2' --verify 'ibr4x9x2' --verify 'ofr4x9x2' --verify 'ifr4x9x2' --verify 'obc4x9x2' --verify 'ibc4x9x2' --verify 'ofc4x9x2' --verify 'ifc4x9x2' --verify 'ok[6e10x7o01' --verify 'ik[6e10x7o01' --verify 'obr6x9x8x10' --verify 'ibr6x9x8x10' --verify 'ofr6x9x8x10' --verify 'ifr6x9x8x10' --verify 'obc6x9x8x10' --verify 'ibc6x9x8x10' --verify 'ofc6x9x8x10' --verify 'ifc6x9x8x10' --verify 'okd6e10x5o01x5e11x7o01' --verify 'ikd6e10x5o01x5e11x7o01' --verify 'ofr]5x20v4' --verify 'ifr]5x20v4' --verify 'obc]5x20v4' --verify 'ibc]5x20v4' --verify 'ofc]5x20v4' --verify 'ifc]5x20v4'
make[3]: *** [Makefile:890: check-local] Error 1
make[3]: Leaving directory '/scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/build/FFTW/3.3.8/gompi-2020b/fftw-3.3.8/mpi'
make[2]: *** [Makefile:754: check-am] Error 2
make[2]: Leaving directory '/scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/build/FFTW/3.3.8/gompi-2020b/fftw-3.3.8/mpi'
make[1]: *** [Makefile:756: check] Error 2
make[1]: Leaving directory '/scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/build/FFTW/3.3.8/gompi-2020b/fftw-3.3.8/mpi'
make: *** [Makefile:708: check-recursive] Error 1
 (at easybuild/tools/run.py:618 in parse_cmd_output)
== 2021-11-03 08:17:04,137 build_log.py:265 INFO ... (took 3 mins 51 secs)
== 2021-11-03 08:17:04,137 filetools.py:1971 INFO Removing lock /scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/software/.locks/_scratch_cor22_bin_BUILD_EB_janus_easybuild_instances_fsv2_2020b_software_FFTW_3.3.8-gompi-2020b.lock...
== 2021-11-03 08:17:04,141 filetools.py:380 INFO Path /scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/software/.locks/_scratch_cor22_bin_BUILD_EB_janus_easybuild_instances_fsv2_2020b_software_FFTW_3.3.8-gompi-2020b.lock successfully removed.
== 2021-11-03 08:17:04,141 filetools.py:1975 INFO Lock removed: /scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/software/.locks/_scratch_cor22_bin_BUILD_EB_janus_easybuild_instances_fsv2_2020b_software_FFTW_3.3.8-gompi-2020b.lock
== 2021-11-03 08:17:04,141 easyblock.py:3915 WARNING build failed (first 300 chars): cmd " export OMPI_MCA_rmaps_base_oversubscribe=true &&   make check " exited with exit code 2 and output:

Looks like it is down to the mpi installation, so I tried running a simple mpi hello world program with OpenMPI/4.0.5-GCC-10.2.0 and the foss-2020b toolchain and got the following error.

I expect this is down to the compute instance (azure fsv2) i am running on being single node, and not connected to other nodes via infiniband. Maybe I need to add a hook to tell easybuild as much. Does anyone have any insight as to how to solve the problem?

[ip-AC125814:62000] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:62001] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:61992] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:62004] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:61993] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:61994] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:61995] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:61997] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:61998] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:61999] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:62006] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:62002] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:61990] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:61991] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:61996] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:62003] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:62005] pml_ucx.c:291  Error: Failed to create UCP worker
[1635931846.182478] [ip-AC125814:62000:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.182457] [ip-AC125814:62001:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.182433] [ip-AC125814:61990:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.182433] [ip-AC125814:61992:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.184738] [ip-AC125814:61995:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.184783] [ip-AC125814:61997:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.185896] [ip-AC125814:61999:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.186238] [ip-AC125814:62002:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.185291] [ip-AC125814:62004:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.186381] [ip-AC125814:62006:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.185922] [ip-AC125814:61993:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.186175] [ip-AC125814:61994:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.187805] [ip-AC125814:61996:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.186041] [ip-AC125814:61998:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.187811] [ip-AC125814:62003:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.186142] [ip-AC125814:61991:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.937278] [ip-AC125814:62005:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
 Hello World from process:            3 of           17
 Hello World from process:            5 of           17
 Hello World from process:            8 of           17
 Hello World from process:            7 of           17
 Hello World from process:            2 of           17
 Hello World from process:           10 of           17
 Hello World from process:            9 of           17
 Hello World from process:           12 of           17
 Hello World from process:           14 of           17
 Hello World from process:           11 of           17
 Hello World from process:            1 of           17
 Hello World from process:           15 of           17
 Hello World from process:           13 of           17
 Hello World from process:            0 of           17
 Hello World from process:           16 of           17
 Hello World from process:            4 of           17
 Hello World from process:            6 of           17

ompi_info gives:

  Package: Open MPI cor22@ip-AC125806 Distribution
                Open MPI: 4.0.5
  Open MPI repo revision: v4.0.5
   Open MPI release date: Aug 26, 2020
                Open RTE: 4.0.5
  Open RTE repo revision: v4.0.5
   Open RTE release date: Aug 26, 2020
                    OPAL: 4.0.5
      OPAL repo revision: v4.0.5
       OPAL release date: Aug 26, 2020
                 MPI API: 3.1.0
            Ident string: 4.0.5
                  Prefix: /scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/software/OpenMPI/4.0.5-GCC-10.2.0
 Configured architecture: x86_64-pc-linux-gnu
          Configure host: ip-AC125806
           Configured by: cor22
           Configured on: Tue Nov  2 11:59:25 GMT 2021
          Configure host: ip-AC125806
  Configure command line: '--prefix=/scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/software/OpenMPI/4.0.5-GCC-10.2.0'
                          '--build=x86_64-pc-linux-gnu'
                          '--host=x86_64-pc-linux-gnu'
                          '--enable-mpirun-prefix-by-default'
                          '--enable-shared' '--with-cuda=no'
                          '--with-hwloc=/scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/software/hwloc/2.2.0-GCCcore-10.2.0'
                          '--with-libevent=/scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/software/libevent/2.1.12-GCCcore-10.2.0'
                          '--with-ofi=/scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/software/libfabric/1.11.0-GCCcore-10.2.0'
                          '--with-pmix=/scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/software/PMIx/3.1.5-GCCcore-10.2.0'
                          '--with-ucx=/scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/software/UCX/1.9.0-GCCcore-10.2.0'
                          '--without-verbs'
                Built by: cor22
                Built on: Tue Nov  2 12:14:56 GMT 2021
              Built host: ip-AC125806
              C bindings: yes
            C++ bindings: no
             Fort mpif.h: yes (all)
            Fort use mpi: yes (full: ignore TKR)
       Fort use mpi size: deprecated-ompi-info-value
        Fort use mpi_f08: yes
 Fort mpi_f08 compliance: The mpi_f08 module is available, but due to
                          limitations in the gfortran compiler and/or Open
                          MPI, does not support the following: array
                          subsections, direct passthru (where possible) to
                          underlying Open MPI's C functionality
  Fort mpi_f08 subarrays: no
           Java bindings: no
  Wrapper compiler rpath: runpath
              C compiler: gcc
     C compiler absolute: /scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/software/GCCcore/10.2.0/bin/gcc
  C compiler family name: GNU
      C compiler version: 10.2.0
            C++ compiler: g++
   C++ compiler absolute: /scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/software/GCCcore/10.2.0/bin/g++
           Fort compiler: gfortran
       Fort compiler abs: /scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/software/GCCcore/10.2.0/bin/gfortran
         Fort ignore TKR: yes (!GCC$ ATTRIBUTES NO_ARG_CHECK ::)
   Fort 08 assumed shape: yes
      Fort optional args: yes
          Fort INTERFACE: yes
    Fort ISO_FORTRAN_ENV: yes
       Fort STORAGE_SIZE: yes
      Fort BIND(C) (all): yes
      Fort ISO_C_BINDING: yes
 Fort SUBROUTINE BIND(C): yes
       Fort TYPE,BIND(C): yes
 Fort T,BIND(C,name="a"): yes
            Fort PRIVATE: yes
          Fort PROTECTED: yes
           Fort ABSTRACT: yes
       Fort ASYNCHRONOUS: yes
          Fort PROCEDURE: yes
         Fort USE...ONLY: yes
           Fort C_FUNLOC: yes
 Fort f08 using wrappers: yes
         Fort MPI_SIZEOF: yes
             C profiling: yes
           C++ profiling: no
   Fort mpif.h profiling: yes
  Fort use mpi profiling: yes
   Fort use mpi_f08 prof: yes
          C++ exceptions: no
          Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes,
                          OMPI progress: no, ORTE progress: yes, Event lib:
                          yes)
           Sparse Groups: no
  Internal debug support: no
  MPI interface warnings: yes
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
              dl support: yes
   Heterogeneous support: no
 mpirun default --prefix: yes
       MPI_WTIME support: native
     Symbol vis. support: yes
   Host topology support: yes
            IPv6 support: no
      MPI1 compatibility: no
          MPI extensions: affinity, cuda, pcollreq
   FT Checkpoint support: no (checkpoint thread: no)
   C/R Enabled Debugging: no
  MPI_MAX_PROCESSOR_NAME: 256
    MPI_MAX_ERROR_STRING: 256
     MPI_MAX_OBJECT_NAME: 64
        MPI_MAX_INFO_KEY: 36
        MPI_MAX_INFO_VAL: 256
       MPI_MAX_PORT_NAME: 1024
  MPI_MAX_DATAREP_STRING: 128
           MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v4.0.5)
           MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v4.0.5)
           MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA btl: self (MCA v2.1.0, API v3.1.0, Component v4.0.5)
                 MCA btl: tcp (MCA v2.1.0, API v3.1.0, Component v4.0.5)
                 MCA btl: vader (MCA v2.1.0, API v3.1.0, Component v4.0.5)
            MCA compress: bzip (MCA v2.1.0, API v2.0.0, Component v4.0.5)
            MCA compress: gzip (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA crs: none (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                  MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component v4.0.5)
               MCA event: external (MCA v2.1.0, API v2.0.0, Component v4.0.5)
               MCA hwloc: external (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                  MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component
                          v4.0.5)
                  MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component
                          v4.0.5)
         MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v4.0.5)
         MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component v4.0.5)
              MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component v4.0.5)
               MCA mpool: hugepage (MCA v2.1.0, API v3.0.0, Component v4.0.5)
             MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component
                          v4.0.5)
                MCA pmix: isolated (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                MCA pmix: ext3x (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                MCA pmix: flux (MCA v2.1.0, API v2.0.0, Component v4.0.5)
               MCA pstat: linux (MCA v2.1.0, API v2.0.0, Component v4.0.5)
              MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v4.0.5)
           MCA reachable: weighted (MCA v2.1.0, API v2.0.0, Component v4.0.5)
           MCA reachable: netlink (MCA v2.1.0, API v2.0.0, Component v4.0.5)
               MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v4.0.5)
               MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v4.0.5)
               MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v4.0.5)
               MCA timer: linux (MCA v2.1.0, API v2.0.0, Component v4.0.5)
              MCA errmgr: default_app (MCA v2.1.0, API v3.0.0, Component
                          v4.0.5)
              MCA errmgr: default_hnp (MCA v2.1.0, API v3.0.0, Component
                          v4.0.5)
              MCA errmgr: default_orted (MCA v2.1.0, API v3.0.0, Component
                          v4.0.5)
              MCA errmgr: default_tool (MCA v2.1.0, API v3.0.0, Component
                          v4.0.5)
                 MCA ess: env (MCA v2.1.0, API v3.0.0, Component v4.0.5)
                 MCA ess: hnp (MCA v2.1.0, API v3.0.0, Component v4.0.5)
                 MCA ess: pmi (MCA v2.1.0, API v3.0.0, Component v4.0.5)
                 MCA ess: singleton (MCA v2.1.0, API v3.0.0, Component
                          v4.0.5)
                 MCA ess: tool (MCA v2.1.0, API v3.0.0, Component v4.0.5)
                 MCA ess: slurm (MCA v2.1.0, API v3.0.0, Component v4.0.5)
               MCA filem: raw (MCA v2.1.0, API v2.0.0, Component v4.0.5)
             MCA grpcomm: direct (MCA v2.1.0, API v3.0.0, Component v4.0.5)
                 MCA iof: hnp (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA iof: orted (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA iof: tool (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                MCA odls: default (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                MCA odls: pspawn (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA oob: tcp (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA plm: isolated (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA plm: rsh (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA plm: slurm (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA ras: simulator (MCA v2.1.0, API v2.0.0, Component
                          v4.0.5)
                 MCA ras: slurm (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                MCA regx: fwd (MCA v2.1.0, API v1.0.0, Component v4.0.5)
                MCA regx: naive (MCA v2.1.0, API v1.0.0, Component v4.0.5)
                MCA regx: reverse (MCA v2.1.0, API v1.0.0, Component v4.0.5)
               MCA rmaps: mindist (MCA v2.1.0, API v2.0.0, Component v4.0.5)
               MCA rmaps: ppr (MCA v2.1.0, API v2.0.0, Component v4.0.5)
               MCA rmaps: rank_file (MCA v2.1.0, API v2.0.0, Component
                          v4.0.5)
               MCA rmaps: resilient (MCA v2.1.0, API v2.0.0, Component
                          v4.0.5)
               MCA rmaps: round_robin (MCA v2.1.0, API v2.0.0, Component
                          v4.0.5)
               MCA rmaps: seq (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA rml: oob (MCA v2.1.0, API v3.0.0, Component v4.0.5)
              MCA routed: binomial (MCA v2.1.0, API v3.0.0, Component v4.0.5)
              MCA routed: direct (MCA v2.1.0, API v3.0.0, Component v4.0.5)
              MCA routed: radix (MCA v2.1.0, API v3.0.0, Component v4.0.5)
                 MCA rtc: hwloc (MCA v2.1.0, API v1.0.0, Component v4.0.5)
              MCA schizo: flux (MCA v2.1.0, API v1.0.0, Component v4.0.5)
              MCA schizo: ompi (MCA v2.1.0, API v1.0.0, Component v4.0.5)
              MCA schizo: orte (MCA v2.1.0, API v1.0.0, Component v4.0.5)
              MCA schizo: slurm (MCA v2.1.0, API v1.0.0, Component v4.0.5)
               MCA state: app (MCA v2.1.0, API v1.0.0, Component v4.0.5)
               MCA state: hnp (MCA v2.1.0, API v1.0.0, Component v4.0.5)
               MCA state: novm (MCA v2.1.0, API v1.0.0, Component v4.0.5)
               MCA state: orted (MCA v2.1.0, API v1.0.0, Component v4.0.5)
               MCA state: tool (MCA v2.1.0, API v1.0.0, Component v4.0.5)
                 MCA bml: r2 (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                MCA coll: basic (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                MCA coll: inter (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                MCA coll: libnbc (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                MCA coll: self (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                MCA coll: sm (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                MCA coll: sync (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                MCA coll: tuned (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                MCA coll: monitoring (MCA v2.1.0, API v2.0.0, Component
                          v4.0.5)
                MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v4.0.5)
               MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component v4.0.5)
               MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component
                          v4.0.5)
               MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component
                          v4.0.5)
               MCA fcoll: two_phase (MCA v2.1.0, API v2.0.0, Component
                          v4.0.5)
               MCA fcoll: vulcan (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                  MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                  MCA io: ompio (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                  MCA io: romio321 (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA mtl: ofi (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v4.0.5)
                 MCA osc: monitoring (MCA v2.1.0, API v3.0.0, Component
                          v4.0.5)
                 MCA osc: pt2pt (MCA v2.1.0, API v3.0.0, Component v4.0.5)
                 MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v4.0.5)
                 MCA osc: ucx (MCA v2.1.0, API v3.0.0, Component v4.0.5)
                 MCA pml: v (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA pml: cm (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA pml: monitoring (MCA v2.1.0, API v2.0.0, Component
                          v4.0.5)
                 MCA pml: ob1 (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA pml: ucx (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA rte: orte (MCA v2.1.0, API v2.0.0, Component v4.0.5)
            MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component
                          v4.0.5)
            MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component
                          v4.0.5)
            MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v4.0.5)
                MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component
                          v4.0.5)
           MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component
                          v4.0.5)
@ocaisa
Copy link
Member

ocaisa commented Nov 9, 2021

We also saw this in EESSI/software-layer#136 for Azure. The fix is to export OMPI_MCA_pml=ucx, but this is in general fixed in later versions of OpenMPI.

@boegel boegel added this to the 4.x milestone Nov 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants