Skip to content

Restarting a simulation

f-schmitt-zih edited this page Apr 28, 2014 · 15 revisions

You are here: Home > PIConGPU User Documentation > Restarting a simulation


PIConGPU supports restarting the simulation run from checkpoints stored on disk.

Checkpointing

Checkpoints are special dumps of the simulation data that contain all information required for a restart. This may include internal data that is not required/intended for post-processing or analysis. To enable checkpoints, add the

--checkpoints <period>

flag to the PIConGPU command line, specifying the period with which checkpoints should be created.

Plugins will receive a special notification for a checkpoint every --checkpoints steps in addition to the standard notification, triggered every <plugin>.period steps. Note that some plugins might specify additional parameters for checkpoints which must be set to enable checkpointing for this plugin.

Since restarts require most field and particle data, the HDF5Writer plugin must be enabled. Whenever a standard output notification and a checkpoint notification are triggered for HDF5Writer for the same timestep, both the checkpoint and the standard output are written. For information on other plugins, see their documentation.

Restarts

Restarting PIConGPU requires that checkpoints are created as shown in the above section and the HDF5Writer plugin is enabled. In this case, set the following flags:

--restart --restart-step <checkpoint step>

Additional plugin-specific flags might be necessary to enable restarts.

Example

This example shows how to set flags to create checkpoints and restarts using the HDF5Writer plugin.

Checkpointing: Run a simulation with 8 GPUs for 1024 steps, dumping results every 128 steps and checkpointing every 512 steps. Dumps are created with the filename prefix "simData", checkpoints use their default directory and filename prefix ("checkpoints/h5_checkpoint" for HDF5Writer checkpoints).

-d 2 2 2 -g 256 512 256 -s 1024 --checkpoints 512 --hdf5.period 128 --hdf5.file simData 

Restart: Restart with the same GPU and grid configuration from the last checkpoint (1024). Since we used the default file names during checkpointing, we do not need to provide any file prefixes. After restart, simulate another 1024 steps, up until timestep 2048. Here, neither output nor checkpoints are written.

-d 2 2 2 -g 256 512 256 -s 2048 --restart --restart-step 1024 

Fine-tuning Checkpoints and Restarts

By default, the checkpoints directory is created below <run>/simOutput/. This can be changed using

--checkpoint-directory <absolute or relative path>

which creates either /my/absolute/directory or <run>/simOutput/<relative directory>. Note that the creation of deep directory-structures is currently not supported. Optionally, the flag hdf5.checkpoint-file <absolute or relative filename prefix> may be set to specify a special filename for checkpoint files. If hdf5.checkpoint-file is set with an absolute path, HDF5Writer ignores the application-wide --checkpoint-directory setting.

For restart, the default can be modified using

--restart-directory <absolute or relative path>

and hdf5.restart-file for which the same as for checkpoints applies.