Skip to content

Files

Latest commit

42eb550 · Sep 18, 2024

History

History
56 lines (41 loc) · 2.56 KB

README.md

File metadata and controls

56 lines (41 loc) · 2.56 KB

VecKog

The vectorized (AVX-512) batched singular value decomposition algorithm for matrices of order two.

This software is a supplementary material for the paper doi:10.1142/S0129626420500152 (arXiv:2005.07403 [cs.MS]).

Building

Prerequisites

A recent Intel C compiler on a 64-bit Linux (e.g., CentOS 7.8) is required. The Intel MKL (Math Kernel Library) is recommended, but another LAPACK library could work with some tweaking.

Make options

Run make in the src subdirectory as follows:

make [COMPILER=x64x|x200|x64] [MARCH=...] [NDEBUG=optimization_level] [TEST=0..15] [all|clean|help]

where COMPILER should be set to x64x for Xeons, or to x200 for Xeon Phi KNLs, respectively. Here, NDEBUG should be set to the desired optimization level (3 is a sensible choice). If unset, the predefined debug-mode build options will be used.

For testing, TEST=0 builds the vectorized code, and TEST=4 builds the pointwise code. Adding two to TEST enables the optional backscaling, while adding one enables the step-by-step printouts. Adding eight to TEST turns on tracking of IA32_MPERF and IA32_APERF MSRs (requires running the executables as root). For example, make COMPILER=x200 NDEBUG=3 clean all will trigger a full, release-mode rebuild for the KNLs of the vectorized code only (equivalent to TEST=0).

Running

The test data generator

To write N finite pseudorandom doubles into FileName file, run:

./src/rndgen.exe N FileName

A single-vector algorithm test

To test the real (or the complex, in the second line) algorithm T, where T=TEST, on N vectors from FileName, run:

./src/d8svd2tT.exe N FileName
./src/z8svd2tT.exe N FileName

The multi-batch test

To test the real (or the complex, in the second line) algorithm T, where T=TEST, on #batches batches, each with n matrices read from infile, run:

./src/dbatchT.exe n #batches infile
./src/zbatchT.exe n #batches infile

For now, n has to be a power of two (not a constraint on the algorithm itself, but only on the error testing procedure).

This work has been supported in part by Croatian Science Foundation under the project IP-2014-09-3670 (MFBDA).