diff --git a/docs/source/usage/plugins/openPMD.rst b/docs/source/usage/plugins/openPMD.rst index 15e1deb43b..0e1617c549 100644 --- a/docs/source/usage/plugins/openPMD.rst +++ b/docs/source/usage/plugins/openPMD.rst @@ -113,7 +113,7 @@ PIConGPU command line option description ===================================== ==================================================================================================================================================== ``--openPMD.period`` Period after which simulation data should be stored on disk. ``--openPMD.source`` Select data sources and filters to dump. Default is ``species_all,fields_all``, which dumps all fields and particle species. -``--openPMD.file`` Relative or absolute openPMD file prefix for simulation data. If relative, files are stored under ``simOutput``. +``--openPMD.file`` Relative or absolute openPMD file prefix for simulation data. If relative, files are stored under ``simOutput``. ``--openPMD.ext`` openPMD filename extension (this controls thebackend picked by the openPMD API). ``--openPMD.infix`` openPMD filename infix (use to pick file- or group-based layout in openPMD). Set to NULL to keep empty (e.g. to pick group-based iteration layout). ``--openPMD.json`` Set backend-specific parameters for openPMD backends in JSON format. @@ -122,14 +122,14 @@ PIConGPU command line option description .. note:: - This plugin is a multi plugin. + This plugin is a multi plugin. Command line parameter can be used multiple times to create e.g. dumps with different dumping period. In the case where an optional parameter with a default value is explicitly defined, the parameter will always be passed to the instance of the multi plugin where the parameter is not set. e.g. .. code-block:: bash - --openPMD.period 128 --openPMD.file simData1 --openPMD.source 'species_all' + --openPMD.period 128 --openPMD.file simData1 --openPMD.source 'species_all' --openPMD.period 1000 --openPMD.file simData2 --openPMD.source 'fields_all' --openPMD.ext h5 creates two plugins: @@ -137,6 +137,36 @@ PIConGPU command line option description #. dump all species data each 128th time step, use HDF5 backend. #. dump all field data each 1000th time step, use the default ADIOS backend. +Backend-specific notes +^^^^^^^^^^^^^^^^^^^^^^ + +HDF5 +==== + +In order to avoid a performance bug for parallel HDF5 on the ORNL Summit compute system, a specific version of ROMIO should be chosen and performance hints should be passed: + +.. code-block:: bash + + > export OMPI_MCA_io=romio321 + > export ROMIO_HINTS=./my_romio_hints + > cat << EOF > ./my_romio_hints + romio_cb_write enable + romio_ds_write enable + cb_buffer_size 16777216 + cb_nodes + EOF + +Replace ```` with the number of nodes that your job uses. +These settings are applied automatically in the Summit templates found in ``etc/picongpu/summit-ornl``. +For further information, see the `official Summit documentation `_ and `this pull request for WarpX `_. + + +Performance +^^^^^^^^^^^ + +On the Summit compute system, specifying ``export IBM_largeblock_io=true`` disables data shipping, which leads to reduced overhead for large block write operations. +This setting is applied in the Summit templates found in ``etc/picongpu/summit-ornl``. + Memory Complexity ^^^^^^^^^^^^^^^^^ @@ -192,4 +222,4 @@ Notes on the implementation of a proper template file: * Most batch systems will forward all resource allocations of a batch script to launched parallel processes inside the batch script. When launching several processes asynchronously, resources must be allocated explicitly. This includes GPUs, CPU cores and often memory. -* This setup is currently impossible to implement on the HZDR Hemera cluster due to a wrong configuration of the Batch system. \ No newline at end of file +* This setup is currently impossible to implement on the HZDR Hemera cluster due to a wrong configuration of the Batch system. diff --git a/etc/picongpu/summit-ornl/gpu_batch.tpl b/etc/picongpu/summit-ornl/gpu_batch.tpl index 8ac5557316..b118269b5d 100644 --- a/etc/picongpu/summit-ornl/gpu_batch.tpl +++ b/etc/picongpu/summit-ornl/gpu_batch.tpl @@ -88,6 +88,22 @@ export OMPI_MCA_coll_ibm_skip_barrier=true #jsrun -N 1 -n !TBG_nodes !TBG_dstPath/input/bin/cuda_memtest.sh +# I/O tuning inspired from WarpX, see https://github.com/ECP-WarpX/WarpX/pull/2495 +# ROMIO has a hint for GPFS named IBM_largeblock_io which optimizes I/O with operations on large blocks +export IBM_largeblock_io=true + +# MPI-I/O: ROMIO hints for parallel HDF5 performance +export OMPI_MCA_io=romio321 +export ROMIO_HINTS=./romio-hints +# number of hosts: unique node names minus batch node +NUM_HOSTS=$(( $(echo $LSB_HOSTS | tr ' ' '\n' | uniq | wc -l) - 1 )) +cat > romio-hints << EOL +romio_cb_write enable +romio_ds_write enable +cb_buffer_size 16777216 +cb_nodes ${NUM_HOSTS} +EOL + #if [ $? -eq 0 ] ; then export OMP_NUM_THREADS=!TBG_coresPerGPU jsrun --nrs !TBG_tasks --tasks_per_rs 1 --cpu_per_rs !TBG_coresPerGPU --gpu_per_rs 1 --latency_priority GPU-CPU --bind rs --smpiargs="-gpu" !TBG_dstPath/input/bin/picongpu --mpiDirect !TBG_author !TBG_programParams | tee output diff --git a/etc/picongpu/summit-ornl/gpu_batch_pipe.tpl b/etc/picongpu/summit-ornl/gpu_batch_pipe.tpl index 4706c8b7a5..9a757ee8e4 100644 --- a/etc/picongpu/summit-ornl/gpu_batch_pipe.tpl +++ b/etc/picongpu/summit-ornl/gpu_batch_pipe.tpl @@ -110,6 +110,22 @@ export OMP_NUM_THREADS=!TBG_coresPerGPU # strategies that keep communication in one node. export OPENPMD_CHUNK_DISTRIBUTION=hostname_binpacking_binpacking +# I/O tuning inspired from WarpX, see https://github.com/ECP-WarpX/WarpX/pull/2495 +# ROMIO has a hint for GPFS named IBM_largeblock_io which optimizes I/O with operations on large blocks +export IBM_largeblock_io=true + +# MPI-I/O: ROMIO hints for parallel HDF5 performance +export OMPI_MCA_io=romio321 +export ROMIO_HINTS=./romio-hints +# number of hosts: unique node names minus batch node +NUM_HOSTS=$(( $(echo $LSB_HOSTS | tr ' ' '\n' | uniq | wc -l) - 1 )) +cat > romio-hints << EOL +romio_cb_write enable +romio_ds_write enable +cb_buffer_size 16777216 +cb_nodes ${NUM_HOSTS} +EOL + # export LD_PROFILE_OUTPUT=`pwd` # export LD_PROFILE=libadios2_evpath.so jsrun --nrs !TBG_tasks --tasks_per_rs 1 --cpu_per_rs !TBG_coresPerGPU --gpu_per_rs 1 --latency_priority GPU-CPU --bind rs --smpiargs="-gpu" !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams | tee ../output &