Skip to content
mikerabat edited this page Sep 27, 2023 · 7 revisions

Linear Algebra

Most of the linear algebra functionality are a direct conversion from the numerical recipies in C book (second edition) - so their basics are quite simple and can be looked up there. The routines translated from there are:

  • Singular Value decomposition (only as a reference implementation in LinAlgSVD.pas)
  • QR (only a reference implementation in LinAlgQR.pas)
  • Cholesky (only a reference implementation in LinAlgCholesky.pas)
  • LU (only a reference implementation LinAlgLU.pas)
  • Gauss Jordan elimination (though not exposed in the matrix class)

Since these original algorithms are not cache oblivious the performance may not be as good as in the other matrix operations.

Note that the Gauss Jordan elimination is the only routine from NR left in the matrix class. All others have been superseeded by the faster cache oblivous and panel based algorithms.

LU Decomposition

An exception here is the LU decomposition which is based on the recursive implementation of the standard LU decomp. The algrithm splits the matrix into two rectangles (m x n/2) until the left part ends up being only a single column. After that operation the matrix contains 4 parts:

A =  L  A12
    A21 A22

We then apply transformations on the right part (solve A12 = L-1A12 and A22= A22 - A21A12) and apply the LU Algorithm on the right part. The most work in the algrithm is done within the matrix multiplication which can profit from multiple cores.

In the library itself the matrix inversion and matrix determinant calculation are based on the LU decomposition.

QR Decomposition

In addition the QR Algorithm has now been updated such that it's cache oblivious and uses a panel based approach similar to the one used in linpack (netlib.org). The matrix multiplications used in that algorithm make use of multiple cores meaning that most of the new implementation can benifit from that. The algorithm needs some additional memory to store intermediate results (netlib: T matrix and W matrix): panelSizeheightsizeof(double) + QRMultblocksizeQRMultblocksizesizeof(double).

The base algorithm is:

  • Apply qr decomposition to one panel
  • Create diagonal update matrix T
  • Apply the block reflector which allows us to use some optimized matrix multiplication routines.
  • Goto next block panel.

You can adjust the size of the panel and the QR decomposition block size with the variables

  • QRBlcokSize = 24
  • QRMultBlockSize = 64

defined in BlockSizeSetup.pas .

Cholesky Decompositoin

Cholesky decomposition has been updated to work on a panel basis instead of single lines. The output is a lower triangular matrix whereas the original input in the lower triangle is overwritten - the upper triangle is not touched. The cholesky decomposition comes in 3 flavours:

  • Single line based
  • Block based
  • Recursive

The version used in the matrix class is the blocked one which utilizes the highly optimized matrix multiplication as well as the threaded matrix multiplication (in the thraded matrix object).

The panel size can be ajdusted via the global variable CholBlockSize (default = 24) in BlockSizeSetup.pas .

Singular Value Decompositoin

A new version of the simple SVD that has been implemented in the library is now based on a faster bidiagonal reduction and singular value search so one can make use of the highly optimized matrix multiplication routines. In case the matrix is way larger in one dimension a QR decomposition is carried out in the first place. The output are the left and right singular vectors U and V whereas V is transposded in the new implementation So the output is:

  • U S V in the single lined algorithm
  • U S V' in the new algorithm

The version used in the matrix class is the new algorithm and be aware that it now returns the transposed right singular vectors. In case you want to use the multithreaded version which essentially implements the threaded QR decomposition, threaded matrix multiplication and threaded matrix rotations, you need to used the threaded matrix object 'TThreadedMatrix' in ThreadedMatrix.pas.

You can find the two versions in LinAlgSVD.pas:

  • MatrixSVDInPlace: is the single line SVD,
  • MatrixSVDInPlace2: is the block based SVD and returns the transposed right singular vectors in V

The panel size can be ajdusted via the global variable SVDBlockSize (default = 24) in BlockSizeSetup.pas and QRBlockSize (default = 24) as well as QRMultBlockSize (default = 64).

Example

Simple usage of the SVD and test if the identity 'X = U * S * VT' holds.

procedure TestSVD;
const cA : Array[0..11] of double = (1, 2, 3, 4, 5, 4, 3, 1, 1, 2, 1, 2);
var a : TDoubleMatrix;
    u, w, v : TDoubleMatrix;
begin
     a := TDoubleMatrix.Create(4, 3);
     a.Assign(cA, 4, 3);

     a.SVD(u, v, w, False);

     Assert((u.Width = 3) and (u.Height = 3), 'Error U has the wrong dimension');
     Assert((w.Width = 3) and (w.Height = 3), 'Error W has the wrong dimension');
     Assert((v.Width = 4) and (v.Height = 3), 'Error V has the wrong dimension');

     // check if U*W*V' = A
     w.MultInPlace(v);
     u.MultInPlace(w);

     Assert( CheckMtx(a.SubMatrix, u.SubMatrix), 'Error U*W*V'' is not A');

     a.Free;
     U.Free;
     V.Free;
     w.Free;
end;

Hess decomposition

A new version of the hess decomposition is now available. This library calculates the upper hess matrix, an upper triangular matrix plus the first subdiagonal.

You can find two version in Eigensystems.pas:

  • MatrixHessenberg2InPlace: This function calculates the upper Hessenberg matrix from an n by n matrix and replaces the original input. The algrithm utilizes a block wise approach.
  • ThrMtxHessenberg2InPlace: A threaded version of the above function. Basically the long running elements (block multiplications) are replaced by threaded versions.

The unit also contains the full version of the Hessenberg transformation in the form of 'QHQ' = A'

The block size can be changed in the global variable HessBlockSize (in BlockSizeSetup.pas) - in case the full transformation is requested the block size for the matrix Q can be set in QRBlockSize. The defaults are 32 for both variables.

Example

procedure TestHess;
const cA : Array[0..24] of double = (3, 1, 1, 2, -2, -1, 5, -1, 8, 0, 1, -1, 1, 4, 3, 2.2, 1, 0, -1, -1, 6, -6, -2, -2, 4);
      cHess : Array[0..24] of double = (3.00000,  -1.16115,   0.01819,   0.46244,  -2.90474,
                                        6.54523,   3.23436,   1.99892,   2.70807,   6.29351,
                                        0,   -5.19787,  -2.68521,   0.34935,   4.62237,
                                        0,   0,  -1.96271,   2.80563,   3.66568,
                                        0,   0,   0,   2.41975,   5.64523 );
      cQ : Array[0..24] of double = ( 1.00000,   0.00000,   0.00000,   0.00000,   0.00000,
                                      0.00000,  -0.15278,  -0.43603,  -0.78587,  -0.41103,
                                      0.00000,   0.15278,  -0.75146,   0.55545,  -0.32161,
                                      0.00000,   0.33612,   0.47957,   0.08991,  -0.80557,
                                      0.00000,   0.91670,  -0.12327,  -0.25652,   0.28047 );
var a, h1, h2, q : TDoubleMatrix;
begin
     a := TDoubleMatrix.Create(cA, 5, 5);
     try
        a.Hess( h1 );
        a.HessFull(q, h2);

        Check( CheckMtx(h1.SubMatrix, h2.SubMatrix), 'Full and single hess decomp is not the same');

        Check( CheckMtx( cHess, h1.SubMatrix ), 'Failed to calculate hess decomp');
        Check( CheckMtx( cQ, q.SubMatrix ), 'Failed to calculate hess decomp');
     finally
            a.Free;
            q.Free;
            h1.Free;
            h2.Free;
     end;
end;
Clone this wiki locally