Communication, Computation and Data movement

Summary

These experiments compare the costs of communication, computation and CPU-GPU data movements of kernels Filter, QR, Rayleigh-Ritz and Residuals. We designed a weak-scaling experiment, in which the count of compute nodes increases from 1 to 64, while the matrix size increases from 30k to 240k. Only the first iteration is reported, which ensures a fixed workload with the increase of compute nodes count. The experiments are carried on JUWELS-Booster for ChASE-GPU. For this experiment, we used artificial matrices of type Uniform with nev and nex being fixed to 2250 and 750.

Data

The experiments require the generation of Uniform matrix of size 30k, 60k, 120k and 240k through the scripts in matGen . The generated matrix should be avaiable in the folder data.

Software dependencies

Build

The build of GPU version ChASE requires

a C/C++ compiler (GCC 11.3.0 tested)
MPI (OpenMPI 4.1.4 tested)
Intel MKL (version 2022.1.0 tested)
CMake (version 3.23.1 tested)
Boost (version 1.79.0 tested)
git (version 2.36.0 tested)
CUDA (version 11.7 tested)

extract data

Extract of useful profiling data requires

grep

and

Python3 (version 3.8.5 tested) with the libraries:

pandas (version 1.3.2 tested)

Plot

The plots of results require Python3 (version 3.8.5 tested) with the libraries:

matplotlib (version 3.3.2 tested)
pandas (version 1.3.2 tested)

Structure

The structure of this folder is given as follows:

├── ChASE-v1.2
├── ChASE-v1.4
├── nccl
|	├── 1
│ 	├── 4
│  	├── 16
│  	|── 64
│  	|── build.sh
│  	|── clean.sh
│  	|── submit.sh
├── no-nccl
|	├── 1
│ 	├── 4
│  	├── 16
│  	|── 64
│  	|── build.sh
│  	|── clean.sh
│  	|── submit.sh
├── v1.2.1
|	├── 1
│ 	├── 4
│  	├── 16
│  	|── 64
│  	|── build.sh
│  	|── clean.sh
│  	|── submit.sh
├── data.py
└── README.md

In the directory ChASE-v1.2, a simplified version of ChASE v1.2.1 is provided by inserting the required timers. The reason to provide this simplified version is that timers are not available in the release version v1.2.1. Similarly, a simplified version of ChASE v1.4 is also provided in the directory ChASE-v1.4.

The scripts for the builds are available in each folder, named as build.sh. The folders 1, 4, 16, 64 contain the script for the experiments with 1, 4, 16, 64 nodes on JUWELS-Booster, respectively. Finally, the bash script submit.sh is used to submit all the jobs in this folder.

The output for each case is stored in its own folder, it will at first cleaned by the script clean.sh within each folder of build. And finally, a CSV file is generated by data.py which collects are the data.

Workflow

GPU build without NCCL and experiments

go into the folder no-nccl

cd no-nccl

build ChASE

./build.sh

submit jobs

./submit.sh

clean output files of all jobs

./submit.sh

GPU build with NCCL and experiments

go into the folder nccl

cd nccl

build ChASE

./build.sh

submit jobs

./submit.sh

clean output files of all jobs

./submit.sh

GPU build of ChASE v1.2 and experiments

go into the folder v1.2.1

cd v1.2.1

build ChASE

./build.sh

submit jobs

./submit.sh

clean output files of all jobs

./submit.sh

extract data into CSV file

In the folder commVScompute

python ./data.py

The results will be saved as ../../results/comm_vs_compute_vs_cpy.csv

Plot

All the Python based plot scripts are available in the folder plots.

Plots for communication vs compuation

python comm_vs_compt.py

The plots, Filter-GPU.pdf, QR-GPU.pdf, RR-GPU.pdf, Resid-GPU.pdf are available in the folder ../../plots/pdf.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Communication, Computation and Data movement

Summary

Data

Software dependencies

Build

extract data

Plot

Structure

Workflow

GPU build without NCCL and experiments

GPU build with NCCL and experiments

GPU build of ChASE v1.2 and experiments

extract data into CSV file

Plot

Plots for communication vs compuation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Communication, Computation and Data movement

Summary

Data

Software dependencies

Build

extract data

Plot

Structure

Workflow

GPU build without NCCL and experiments

GPU build with NCCL and experiments

GPU build of ChASE v1.2 and experiments

extract data into CSV file

Plot

Plots for communication vs compuation