-
Notifications
You must be signed in to change notification settings - Fork 218
Debugging
You are here: Home > Developer Documentation > Debugging
Change your build type from Release
to Debug
: pic-configure -c "-DCMAKE_BUILD_TYPE=Debug" [...]
You can add additional information to your output by adding -DPIC_VERBOSE=<N>
and -DPMACC_VERBOSE=<M>
to your cmake
options during compile time (or use ccmake .
after pic-configure [...]
).
To activate multiple levels, simply add them.
Example:
# PHYSICS (1) + CRITICAL(4) + SIMULATION_STATE(16)
pic-configure -c "-DCMAKE_BUILD_TYPE=Debug -DPIC_VERBOSE=21" ../paramSets/lwfa
From src/picongpu/include/debug/PIConGPUVerbose.hpp:
DEFINE_LOGLVL(0,NOTHING);
DEFINE_LOGLVL(1,PHYSICS);
DEFINE_LOGLVL(2,DOMAINS);
DEFINE_LOGLVL(4,CRITICAL);
DEFINE_LOGLVL(8,MEMORY);
DEFINE_LOGLVL(16,SIMULATION_STATE);
DEFINE_LOGLVL(32,INPUT_OUTPUT);
From src/libPMacc/include/debug/PMaccVerbose.hpp:
DEFINE_LOGLVL(0,NOTHING);
DEFINE_LOGLVL(1,MEMORY);
DEFINE_LOGLVL(2,INFO);
DEFINE_LOGLVL(4,CRITICAL);
DEFINE_LOGLVL(8,MPI);
DEFINE_LOGLVL(16,CUDA_RT);
DEFINE_LOGLVL(32,COMMUNICATION);
DEFINE_LOGLVL(64,EVENT);
A very useful tool to find out the resolved type of an object that produces a compile time error is:
PMACC_CASSERT_MSG_TYPE(pmacc_msg,pmacc_typeInfo,...)
, defined in libPMacc/include/static_assert.hpp
Therein:
-
pmacc_msg
can be a self-defined message but must be a valid C++ class name -
pmacc_typeInfo
is the type to be resolved -
...
must be a condition that returnstrue
orfalse
The static assert will fail if the condition is false and return the message and the resolved type.
Example:
PMACC_CASSERT_MSG_TYPE(
This_is_the_resolved_type_of_MyObjectType,
MyObjectType,
1==2
);
MyObjectType myObject; // this declaration crashed for you previously
myObject(arg1,arg2,...)
The following tools will profit from additional information in your compiled binaries, such as code lines.
Consider activating at least the following cmake flags with ccmake .
after the configure step:
-
-DCUDA_SHOW_CODELINES=ON
:source code lines known incuda-gdb
and source code lines in ptx code (if kept via-DCUDA_KEEP_FILES=ON
-
-DPMACC_BLOCKING_KERNEL=ON
: no parallel kernels any more ->cudaGetLastError()
is now at the exact right kernel that crashes -
-DCUPLA_STREAM_ASYNC_ENABLE=OFF
: disable asynchronous streams (requires PIConGPU 0.4.0+) -
-DCUDA_NVCC_FLAGS_DEBUG="-g;-G"
: adds full in-device symbols, very long compile time, heavy RT overhead
On more information on the flags or how to use ccmake
, see our documentation on available cmake flags.
A warning on debug flags: using -g
/-G
usually implies no code optimization or -O0
. That might alter your code and can make it hard to track down race conditions.
This page collects some useful hints about how to debug a hybrid (CUDA + device) parallel (MPI) application.
Use the OpenMPI supressions list
mpiexec <mpi flags> valgrind --suppressions=$MPI_ROOT/share/openmpi/openmpi-valgrind.supp picongpu ...
See also: Valgrind Manual, section 4.9 - Debugging MPI Parallel Programs with Valgrind
Suppression files for MPI, network layers and I/O libraries are often out-of-date or not existent. So we need to create our own in order not to drown in noise.
Assume you debug a feature that you can turn on/off (such as a plugin) where you expect to find a memory access violation of leak.
The following workflow allows to suppress all "background" noise by running the program first, silencing all existing valgrind messages, and then running again with the feature to debug turned on.
- run PIConGPU without output, "train" suppression file
- run again with suppression file in scenario that fails
only helps if the memory violation is runtime triggered and not already present in case 1.:
valgrind --leak-check=full --show-reachable=yes --error-limit=no --gen-suppressions=all --log-file=picongpu_all.log \
./bin/picongpu -d 1 1 1 -g 12 80 12 --periodic 1 1 1 -s 10
cat picongpu_all.log | ./parse_valgrind_suppressions.sh > picongpu_all.supp
# assuming you debug the HDF5 plugin, enable it now
valgrind --leak-check=full --show-reachable=yes --error-limit=no --suppressions=./picongpu_all.supp --log-file=picongpu_hdf5.log \
./bin/picongpu -d 1 1 1 -g 12 80 12 --periodic 1 1 1 -s 10 --hdf5.period 10 --hdf5.file simData
parse_valgrind_suppressions.sh (originally from wxwidgets)
Multi-Node Host-Side
Login into an interactive shell/batch session with X-forwarding ssh -X
.
Launch PIConGPU with gdb
and trigger start and back trace automatically:
mpiexec <mpi flags> xterm -e gdb -ex r -ex bt --args picongpu ...
For non-interative use, use gdb --batch -ex r -ex bt ...
.
mpiexec <mpi flags> cuda-memcheck --tool <memcheck|racecheck> picongpu ...
Single-Node device-side
(!) Compile with nvcc -g -G <...>
if you want to set device-side breakpoints.
cd <path>/simOutput
cuda-gdb --args <path2picongpu> -d 1 1 1 -g <...> -s 100 <...>
in cuda-gdb
:
# breakpoints before running the code (if the code lines was not optimized out)
b <FileName>:<LineNumber>
# run program
r
# alternatively: start next step
# print a variable when code stopped or crashed
print <var>
# backtrace: where is the current line of code in the program
bt
# surrounding code lines
list
For non-interative use, use cuda-gdb --batch -ex r -ex bt ...
.
All wiki entries describe the dev branch. Features may be different in the current master branch.
Before you start please read our README!
PIConGPU is a scientific project. If you present and/or publish scientific results that used PIConGPU, you should set a reference to show your support. Our according up-to-date publication at the time of your publication should be inquired from:
The documentation in this wiki is still not complete and we need your help keeping it up to date. Feel free to help improving this wiki!