The Hari-Zimmermann complex generalized hyperbolic SVD and EVD.
A part of the supplementary material for the paper doi:10.1137/19M1277813 (arXiv:1907.08560 [math.NA]).
A recent 64-bit Linux (e.g., CentOS 7.9 with devtoolset-8) or macOS (e.g., Big Sur) is needed.
Then, clone and build JACSD in a directory parallel to this one.
Run make as follows:
cd src
make [COMPILER=x64x|x200|gnu|nvidia] [MARCH=...] [NDEBUG=0|1|2|3|...|g] [all|clean|help]where COMPILER should be set for the Intel C/C++ and Fortran compilers to x64x for Xeons, or to x200 for Xeon Phi KNLs, respectively.
GNU Fortran 9 and newer are not supported!
Please take a look here for the explanation regarding the MAX and MIN intrinsics.
Currently, only GNU Fortran 8 is fully supported with COMPILER=gnu.
On RHEL/CentOS 7 it is provided by, e.g., devtoolset-8.
Here, NDEBUG should be set to the desired optimization level (3 is a sensible choice).
If unset, the predefined debug-mode build options will be used.
For example, make COMPILER=x200 NDEBUG=3 clean all will trigger a full, release-mode rebuild for the KNLs.
In the examples below, TPC stands for threads-per-core.
If the hyperthreading is not desired, it should be set to 1.
FN is the input and output file name prefix (without an extension).
/path/to/phase0.exe input.bin FNPhase 0 is a data conversion phase from a custom data format to a set of plain binary files.
OMP_NUM_THREADS=T OMP_PLACES=CORES OMP_PROC_BIND=SPREAD,CLOSE /path/to/phase1.exe FN L a G TPCL, a, and G are the problem-specific parameters.
OMP_NUM_THREADS=T OMP_PLACES=CORES OMP_PROC_BIND=SPREAD,CLOSE /path/to/phase2.exe FN M N TPCOMP_NUM_THREADS=T OMP_PLACES=CORES OMP_PROC_BIND=SPREAD,CLOSE /path/to/phase3.exe FN M N TPC JSTRAT1 NSWP1 JSTRAT2 NSWP2JSTRAT1 is the inner, and JSTRAT2 the outer Jacobi strategy.
JSTRAT1 can be 2 for cycwor or 4 for mmstep (recommended).
JSTRAT2 can be 3 for cycwor (recommended if a particular number of threads is supported) or 5 for mmstep.
NSWP1 (1 for block-oriented) and NSWP2 (30 should suffice in most cases) are the maximal numbers of the inner and of the outer sweeps allowed, respectively.
OMP_NUM_THREADS=T OMP_PLACES=CORES OMP_PROC_BIND=SPREAD,CLOSE /path/to/phase4.exe FN N TPCAll data is stored in the Fortran array order.
An example of data format of the test cases:
| file name | data type | rows | columns |
|---|---|---|---|
FN.X |
COMPLEX(8) |
2*L*a |
G |
FN.T |
COMPLEX(8) |
2*L |
2*L |
FN.U |
REAL(8) |
L*a |
1 |
FN.YY |
COMPLEX(8) |
2*L*a |
G |
FN.WW |
COMPLEX(8) |
2*L*a |
G |
FN.JJ |
INTEGER(8) |
2*L*a |
1 |
FN.Y |
COMPLEX(8) |
G |
G |
FN.W |
COMPLEX(8) |
G |
G |
FN.J |
INTEGER(8) |
G |
1 |
FN.P |
INTEGER(8) |
G |
1 |
FN.O |
INTEGER(8) |
G |
1 |
FN.YU |
COMPLEX(8) |
G |
G |
FN.WV |
COMPLEX(8) |
G |
G |
FN.Z |
COMPLEX(8) |
G |
G |
FN.EY |
REAL(8) |
G |
1 |
FN.EW |
REAL(8) |
G |
1 |
FN.E |
REAL(8) |
G |
1 |
FN.SY |
REAL(8) |
G |
1 |
FN.SW |
REAL(8) |
G |
1 |
FN.SS |
REAL(8) |
G |
1 |
FN.ZZ |
COMPLEX(8) |
G |
G |
Outputs FN.X, FN.T, FN.U.
Input: FN.X, FN.T, FN.U.
Output: FN.YY, FN.WW, FN.JJ.
Input: FN.YY, FN.WW, FN.JJ.
Output: FN.Y, FN.W, FN.J, FN.P, FN.O.
Input: FN.Y, FN.W, FN.J.
Output: FN.YU, FN.WV, FN.Z; FN.EY, FN.EW, FN.E; FN.SY, FN.SW, FN.SS.
Input: FN.Z.
Output: FN.ZZ.
This work has been supported in part by Croatian Science Foundation under the project IP-2014-09-3670 (MFBDA).