You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Saving Blosc-compressed chunks to an hdf5 dataset by OpenMPI would result in segmentation fault, below is a minimal script that illustrates the issue:
importnumpyasnpimporth5pyimportbloscfrommpi4pyimportMPIcomm=MPI.COMM_WORLDsize=comm.Get_size()
rank=comm.Get_rank()
defprepare(ch, y, x):
start= (ch*rank, 0, 0)
stop= (ch* (rank+1), y, x)
# use random datarng=np.random.default_rng(rank*123)
data=rng.random((ch, y, x), dtype=np.float32)
returnstart, stop, datadefmain():
fp="/dls/tmp/..../compressed_out.hdf5"# nframes should be divisible by rank sizenframes=1200y=2160x=2560global_shape= (nframes, y, x)
ch=nframes//size# just some useful informationifrank==0:
tot_sz=nframes*y*x*4/2**30print(f"full data shape: ({nframes}, {y}, {x})")
print(f"full data size (uncompressed): {tot_sz:.3f} Gb")
print(f"each block size (uncompressed): {tot_sz/size:.3f} Gb")
print(f"Blosc max buffer size: {float(blosc.BLOSC_MAX_BUFFERSIZE) /2**30:.3f} Gb")
start, stop, data=prepare(ch, y, x)
withh5py.File(fp, "w", driver="mpio", comm=MPI.COMM_WORLD) asf:
dset=f.create_dataset("/data",
shape=global_shape,
dtype=data.dtype,
chunks=(1, y, x),
compression=32001,
compression_opts=(0,0,0,0,5,1,1),
)
comm.Barrier()
withdset.collective:
dset[start[0]:stop[0], start[1]:stop[1], start[2]:stop[2]] =datacomm.Barrier()
if__name__=="__main__":
main()
The above script produces some random data from each MPI rank, and saves it to the hdf5 file with Blosc compression (32001 is registered filter ID of Blosc, 5 is the compression level, second-last 1 for using shuffling, last 1 for using lz4 compressor). If the above script is run in the cluster with the following submission script:
#!/usr/bin/env bash#SBATCH --partition=cs05r#SBATCH --job-name=mpi-compressed#SBATCH --nodes 1#SBATCH --ntasks-per-node 4#SBATCH --cpus-per-task=1#SBATCH --mem 100G#SBATCH --time 30# load your environment (openmpi as well)# ....# check using the expected MPItype -p mpicc
srun python save_compressed_by_mpi.py
It will result in a segmentation fault and you will see something like
My initial suspicion is the data block is larger than the maximum buffer size in Blosc (around 2 Gb, see here), as using nframes = 300 (each data block will be (300/4)*2160*2560*4 / 2**30 ~ 1.5 G) will work. But upon more investigation, if you use 2 MPI processes (#SBATCH --ntasks-per-node 2), the above script will work regardless of how large is the data block. Only 2 but not more! So it looks like the Blosc maximum buffer is irrelevant (as I guess h5py will handle this internally, i.e. dividing data block by the maximum buffer size when doing compression)
Looking at the error log, it suggests something nasty was happening at some calls with fcoll_vulcan. From the OpenMPI docs, vulcan is a component in the framework fcoll for MPI I/O operation. Changing the option to dynamic makes the segmentation fault disappears.
This works regardless of the number of MPI processes. Using dynamic_gen2 seems to be slower, at least for me, partly because of using OpenMPI 4.1.x.
As suggested in this issue, using export OMPI_MCA_io=^ompio also works.
I suggest when a centralised httomo environment is built, we can use export OMPI_MCA_fcoll=dynamic as a workaround for now. From the above issue, this problem might be resolved in OpenMPI 5.x so this could just be relevant for 4.x.
The text was updated successfully, but these errors were encountered:
Saving Blosc-compressed chunks to an hdf5 dataset by OpenMPI would result in segmentation fault, below is a minimal script that illustrates the issue:
The above script produces some random data from each MPI rank, and saves it to the hdf5 file with Blosc compression (
32001
is registered filter ID of Blosc,5
is the compression level, second-last1
for using shuffling, last1
for usinglz4
compressor). If the above script is run in the cluster with the following submission script:It will result in a segmentation fault and you will see something like
My initial suspicion is the data block is larger than the maximum buffer size in Blosc (around 2 Gb, see here), as using
nframes = 300
(each data block will be(300/4)*2160*2560*4 / 2**30 ~ 1.5 G
) will work. But upon more investigation, if you use 2 MPI processes (#SBATCH --ntasks-per-node 2
), the above script will work regardless of how large is the data block. Only 2 but not more! So it looks like the Blosc maximum buffer is irrelevant (as I guessh5py
will handle this internally, i.e. dividing data block by the maximum buffer size when doing compression)Looking at the error log, it suggests something nasty was happening at some calls with
fcoll_vulcan
. From the OpenMPI docs,vulcan
is a component in the frameworkfcoll
for MPI I/O operation. Changing the option todynamic
makes the segmentation fault disappears.... export OMPI_MCA_fcoll=dynamic srun python save_compressed_by_mpi.py
This works regardless of the number of MPI processes. Using
dynamic_gen2
seems to be slower, at least for me, partly because of using OpenMPI 4.1.x.As suggested in this issue, using
export OMPI_MCA_io=^ompio
also works.I suggest when a centralised httomo environment is built, we can use
export OMPI_MCA_fcoll=dynamic
as a workaround for now. From the above issue, this problem might be resolved in OpenMPI 5.x so this could just be relevant for 4.x.The text was updated successfully, but these errors were encountered: