These experiments compare the costs of communication, computation and CPU-GPU data movements of kernels Filter, QR, Rayleigh-Ritz and Residuals. We designed a weak-scaling experiment, in which the count of compute nodes increases from 1 to 64, while the matrix size increases from 30k to 240k. Only the first iteration is reported, which ensures a fixed workload with the increase of compute nodes count. The experiments are carried on JUWELS-Booster for ChASE-GPU. For this experiment, we used artificial matrices of type Uniform with nev and nex being fixed to 2250 and 750.
The experiments require the generation of Uniform matrix of size 30k, 60k, 120k and 240k through the scripts in matGen . The generated matrix should be avaiable in the folder data.
The build of GPU version ChASE requires
- a C/C++ compiler (GCC 11.3.0 tested)
- MPI (OpenMPI 4.1.4 tested)
- Intel MKL (version 2022.1.0 tested)
- CMake (version 3.23.1 tested)
- Boost (version 1.79.0 tested)
- git (version 2.36.0 tested)
- CUDA (version 11.7 tested)
Extract of useful profiling data requires
- grep
and
Python3 (version 3.8.5 tested) with the libraries:
- pandas (version 1.3.2 tested)
The plots of results require Python3
(version 3.8.5 tested) with the libraries:
- matplotlib (version 3.3.2 tested)
- pandas (version 1.3.2 tested)
The structure of this folder is given as follows:
├── ChASE-v1.2
├── ChASE-v1.4
├── nccl
| ├── 1
│ ├── 4
│ ├── 16
│ |── 64
│ |── build.sh
│ |── clean.sh
│ |── submit.sh
├── no-nccl
| ├── 1
│ ├── 4
│ ├── 16
│ |── 64
│ |── build.sh
│ |── clean.sh
│ |── submit.sh
├── v1.2.1
| ├── 1
│ ├── 4
│ ├── 16
│ |── 64
│ |── build.sh
│ |── clean.sh
│ |── submit.sh
├── data.py
└── README.md
In the directory ChASE-v1.2
, a simplified version of ChASE v1.2.1 is provided by inserting the required timers. The reason to provide this simplified version is that timers are not available in the release version v1.2.1. Similarly, a simplified version of ChASE v1.4 is also provided in the directory ChASE-v1.4
.
The scripts for the builds are available in each folder, named as build.sh
.
The folders 1
, 4
, 16
, 64
contain the script for the experiments with 1, 4, 16, 64 nodes on JUWELS-Booster, respectively. Finally, the bash script submit.sh
is used to submit all the jobs in this folder.
The output for each case is stored in its own folder, it will at first cleaned by the script clean.sh
within each folder of build. And finally, a CSV file is generated by data.py
which collects are the data.
- go into the folder
no-nccl
cd no-nccl
- build ChASE
./build.sh
- submit jobs
./submit.sh
- clean output files of all jobs
./submit.sh
- go into the folder
nccl
cd nccl
- build ChASE
./build.sh
- submit jobs
./submit.sh
- clean output files of all jobs
./submit.sh
- go into the folder
v1.2.1
cd v1.2.1
- build ChASE
./build.sh
- submit jobs
./submit.sh
- clean output files of all jobs
./submit.sh
In the folder commVScompute
python ./data.py
The results will be saved as ../../results/comm_vs_compute_vs_cpy.csv
All the Python based plot scripts are available in the folder plots.
python comm_vs_compt.py
The plots, Filter-GPU.pdf
, QR-GPU.pdf
, RR-GPU.pdf
, Resid-GPU.pdf
are available in the folder ../../plots/pdf
.