-
Notifications
You must be signed in to change notification settings - Fork 217
Restarting a simulation
You are here: Home > PIConGPU User Documentation > Restarting a simulation
PIConGPU supports restarting the simulation run from checkpoints stored on disk.
Checkpoints are special dumps of the simulation data that contain all information required for a restart. This may include internal data that is not required/intended for post-processing or analysis. To enable checkpoints, add the
--checkpoint.period <period>
flag to the PIConGPU command line, specifying the period with which checkpoints should be created. Each checkpoint step is added to a central checkpoint file.
Plugins will receive a special notification for a checkpoint every --checkpoint.period
steps in addition to the standard notification, triggered every <plugin>.period
steps.
Since restarts require most field and particle data, the HDF5Writer plugin must be enabled. Whenever a standard output notification and a checkpoint notification are triggered for HDF5Writer for the same timestep, both the checkpoint and the standard output are written. For information on other plugins, see their documentation.
Restarting PIConGPU requires that checkpoints are created as shown in the above section and the HDF5Writer plugin is enabled. In this case, set the following flag:
--checkpoint.restart
The last checkpoint is detected automatically by parsing the central checkpoint file in the restart directory.
This example shows how to set flags to create checkpoints and restarts using the HDF5Writer plugin.
Checkpointing: Run a simulation with 8 GPUs for 1024 steps, dumping results every 128 steps and checkpointing every 512 steps. Dumps are created with the filename prefix "simData", checkpoints use their default directory and filename prefix ("checkpoints/h5_checkpoint" for HDF5Writer checkpoints).
-d 2 2 2 -g 256 512 256 -s 1024 --checkpoint.period 512 --hdf5.period 128 --hdf5.file simData
Restart: Restart with the same GPU and grid configuration from the last checkpoint (1024). Since we used the default file names during checkpointing, we do not need to provide any file prefixes. After restart, simulate another 1024 steps, up until timestep 2048. Here, neither output nor checkpoints are written.
-d 2 2 2 -g 256 512 256 -s 2048 --checkpoint.restart
By default, the checkpoint files are created below <run>/simOutput/checkpoints
. This can be changed using
--checkpoint.directory <absolute or relative path>
which creates either /my/absolute/directory
or <run>/simOutput/<relative directory>
. Note that the creation of deep directory-structures is currently not supported.
Optionally, the flag hdf5.checkpoint-file <absolute or relative filename prefix>
may be set to specify a special filename for checkpoint files. If hdf5.checkpoint-file
is set with an absolute path, HDF5Writer ignores the application-wide --checkpoint-directory
setting.
For restart, the default path can be modified using
--checkpoint.restart.directory <absolute or relative path>
and hdf5.restart-file
for which the same as for checkpoints applies.
--checkpoint.restart.step <checkpoint/restart step>
can be added to set a restart step manually instead of using the central checkpoint file.
All wiki entries describe the dev branch. Features may be different in the current master branch.
Before you start please read our README!
PIConGPU is a scientific project. If you present and/or publish scientific results that used PIConGPU, you should set a reference to show your support. Our according up-to-date publication at the time of your publication should be inquired from:
The documentation in this wiki is still not complete and we need your help keeping it up to date. Feel free to help improving this wiki!