You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello Everyone,
We are having trouble running BiocParallel within our SLURM cluster environment.
The foo.R script we are trying to run is
library("BiocParallel")
library("Rmpi")
param <- SnowParam(workers = 3, type = "MPI")
FUN <- function(i) system("hostname", intern=TRUE)
bplapply(1:6, FUN, BPPARAM = param)
If we request an interactive job allocation, for example with salloc -p mpi -N 2 -n 4 -t 1:00:00 and then start R with: mpiexec -np 1 R --no-save and run the above script from this interactive shell we have as expected:
The execution hangs for several seconds and eventually fails with the MPI error:
[compute-a-16-21:10780] OPAL ERROR: Timeout in file base/pmix_base_fns.c at line 193
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_dpm_dyn_init() failed
--> Returned "Timeout" (-15) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
Does anyone have any idea of why the primary R process is failing to start the other tasks?
Thank you
Raffaele
The text was updated successfully, but these errors were encountered:
instead of R CMD BATCH or Rscript seems to work.
The execution still ends with a bad OMPI since the task just dies out there, but at least it does run the hostname on the distributed system
Hello Everyone,
We are having trouble running BiocParallel within our SLURM cluster environment.
The foo.R script we are trying to run is
If we request an interactive job allocation, for example with
salloc -p mpi -N 2 -n 4 -t 1:00:00
and then start R with:mpiexec -np 1 R --no-save
and run the above script from this interactive shell we have as expected:However if we try to run the same R script from within a sbatch job with:
The execution hangs for several seconds and eventually fails with the MPI error:
Does anyone have any idea of why the primary R process is failing to start the other tasks?
Thank you
Raffaele
The text was updated successfully, but these errors were encountered: