Skip to content

Restarting a simulation

Axel Huebl edited this page Nov 14, 2017 · 15 revisions

You are here: Home > PIConGPU User Documentation > Restarting a simulation


PIConGPU supports restarting the simulation run from checkpoints stored on disk.

Checkpointing

Checkpoints are special dumps of the simulation data that contain all information required for a restart. This may include internal data that is not required/intended for post-processing or analysis. To enable checkpoints, add the

--checkpoint.period <period>

flag to the PIConGPU command line, specifying the period with which checkpoints should be created. Each checkpoint step is added to a central checkpoint file.

Plugins will receive a special notification for a checkpoint every --checkpoint.period steps in addition to the standard notification, triggered every <plugin>.period steps.

Since restarts require most field and particle data, the HDF5Writer plugin must be enabled. Whenever a standard output notification and a checkpoint notification are triggered for HDF5Writer for the same timestep, both the checkpoint and the standard output are written. For information on other plugins, see their documentation.

Restarts

Restarting PIConGPU requires that checkpoints are created as shown in the above section and the HDF5Writer plugin is enabled. In this case, set the following flag:

--checkpoint.restart

The last checkpoint is detected automatically by parsing the central checkpoint file in the restart directory.

Example

This example shows how to set flags to create checkpoints and restarts using the HDF5Writer plugin.

Checkpointing: Run a simulation with 8 GPUs for 1024 steps, dumping results every 128 steps and checkpointing every 512 steps. Dumps are created with the filename prefix "simData", checkpoints use their default directory and filename prefix ("checkpoints/h5_checkpoint" for HDF5Writer checkpoints).

-d 2 2 2 -g 256 512 256 -s 1024 --checkpoint.period 512 --hdf5.period 128 --hdf5.file simData 

Restart: Restart with the same GPU and grid configuration from the last checkpoint (1024). Since we used the default file names during checkpointing, we do not need to provide any file prefixes. After restart, simulate another 1024 steps, up until timestep 2048. Here, neither output nor checkpoints are written.

-d 2 2 2 -g 256 512 256 -s 2048 --checkpoint.restart 

Modifying Defaults for Checkpoints and Restarts

By default, the checkpoint files are created below <run>/simOutput/checkpoints. This can be changed using

--checkpoint.directory <absolute or relative path>

which creates either /my/absolute/directory or <run>/simOutput/<relative directory>. Note that the creation of deep directory-structures is currently not supported. Optionally, the flag hdf5.checkpoint-file <absolute or relative filename prefix> may be set to specify a special filename for checkpoint files. If hdf5.checkpoint-file is set with an absolute path, HDF5Writer ignores the application-wide --checkpoint-directory setting.

For restart, the default path can be modified using

--checkpoint.restart.directory <absolute or relative path>

and hdf5.restart-file for which the same as for checkpoints applies.

--checkpoint.restart.step <checkpoint/restart step>

can be added to set a restart step manually instead of using the central checkpoint file.