Skip to content

Performance Analysis

René Widera edited this page Jul 7, 2016 · 19 revisions

You are here: Home > Developer Documentation > Performance Analysis


This page presents information on using performance analysis tools with PIConGPU.

Score-P

Update early 2016 (Score-P 1.X): Michael Sippel's Gist

Update 07/2016 (tested with Score-P 2.X):

Score-P is a measurement infrastructure combining several open-source performance analysis tools. It enables to trace and profile massively-parallel applications, including hybrid MPI+CUDA programs.

PIConGPU has cmake support for Score-P. When building and installing the measurement tool, be sure to enable support for MPI and CUDA (and CUPTI).

<user>:<scorep-build-dir>$ ./configure ... --enable-mpi --enable-cuda

When configuring PIConGPU, use the Score-P wrapper scripts for the C++ and NVCC compiler. Example:

# Switch off instrumentation by setting SCOREP_WRAPPER=OFF
<user>:<pic-build-dir>$ SCOREP_WRAPPER=OFF $PICSRC/configure -a sm_35 \
                            -c "-DCMAKE_CXX_COMPILER=`which scorep-g++` \
                            -DCUDA_NVCC_EXECUTABLE=`which scorep-nvcc`" \
                            ~/paramSets/case001

# Set instrumentation flags (--user if manual instrumentation is used)
<user>:<pic-build-dir>$ export SCOREP_WRAPPER_INSTRUMENTER_FLAGS="--cuda --mpp=mpi"
<user>:<pic-build-dir>$ make -j
<user>:<pic-build-dir>$ make install

On titan@ORNL please use scorep-CC instead of scorep-g++

Tracing with OpenMP support:

  • Opari backend: error: ‘_ZTW9pomp_tpd_()’ is not a variable in clause ‘copyin’
    • extend SCOREP_WRAPPER_INSTRUMENTER_FLAGS with --opari=--omp-tpd:--c++:--omp-tpd-mangling='gnu'
  • if the error scorep_thread_create_wait_pthread.c:84: Fatal: Bug 'tpd == 0': Invalid Pthread thread specific data object. Please ensure that all pthread_create calls are instrumented. is triggered try to disable Opari and use:
    • export SCOREP_WRAPPER_INSTRUMENTER_FLAGS="--cuda --mpp=mpi --thread=omp:ancestry --nopomp"

Before executing PIConGPU, several Score-P environment variables must be set in your batch environment template script. Some template scripts already provide these environment variables, e.g. titan-ornl/batch_scorep_profile.tpl. For your own script, set at least the following (buffer sizes may vary):

export SCOREP_ENABLE_TRACING=yes
export SCOREP_CUDA_ENABLE=yes
export SCOREP_CUDA_BUFFER=200M
export SCOREP_TOTAL_MEMORY=1G
export SCOREP_FILTERING_FILE=!TBG_dstPath/tbg/scorep.filter

When successfull, a new directory called scorep-* is created which contains the trace file trace.otf2. The trace can than be visualized with Vampir.

Further information: