Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Eigen as an optional backend for Vector and Array classes #307

Open
1 task done
zasexton opened this issue Dec 3, 2024 · 10 comments
Open
1 task done

Add Eigen as an optional backend for Vector and Array classes #307

zasexton opened this issue Dec 3, 2024 · 10 comments
Labels
enhancement New feature or request

Comments

@zasexton
Copy link
Contributor

zasexton commented Dec 3, 2024

Use Case

For the Vector.h and Array.h files we currently implement custom Vector and Array objects. Eigen is a high-level C++ library of template headers for linear algebra, matrix and vector operations, geometrical transformations, numerical solvers and related algorithms. Often Eigen or another such robust linear algebra library is used within large scientific computation or industry projects for clean, clear, and optimized data abstractions. Ultimately, leveraging Eigen can allow for many routines in svMultiPhysics to be expressed algebraically while accelerating computation with library vectorization.

Problem

Custom Vector.h and Array.h classes offer straightforward but non-optimized mathematical operations and lack of advanced linear algebra functionalities. These implementations typically require manual management of memory and computations, leading to increased code complexity and a higher risk of bugs and numerical inaccuracies. Additionally these custom classes may struggle with scalability and efficiency, making it difficult to handle large datasets or perform complex operations efficiently. This may result in code redundancies, performance inefficiencies, and limited extensibility of the current code base.

Solution

  • Add flexible configuration flag USE_EIGEN to CMAKE to allow for Vector.h and Array.h to be built using Eigen::Map objects.
  • Add Eigen linear algebra algorithms to mat_fun.h/mat_fun.cpp files

Alternatives considered

Eigen seems the most convenient to add as a backend; however other libraries such as Boost might be considered with this same interfacing scheme.

Additional context

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct and Contributing Guidelines
@zasexton zasexton added the enhancement New feature or request label Dec 3, 2024
@zasexton
Copy link
Contributor Author

zasexton commented Dec 4, 2024

The branch for add the optional Eigen backend is given here: https://github.com/zasexton/svFSIplus_public/tree/update-array-with-eigen-v2 . I will be testing the Array and Vector performances for the Eigen and original custom Array and Vector implementations.

Note that with the scheme that we are initially pursuing, we will managing the memory allocation and the assignment of data. Eigen will be used through Eigen::Map objects to perform operations on the data.

@zasexton
Copy link
Contributor Author

zasexton commented Dec 4, 2024

Initial timing performance of Eigen backend for Array.h only:

time /mnt/e/svFSIplus_public/build/svFSI-build/bin/svFSI svFSIplus.xml
---------------------------------------------------------------------
 Eq     N-i     T       dB  Ri/R1   Ri/R0    R/Ri     lsIt   dB  %t
---------------------------------------------------------------------
 NS 1-1  7.830e-01  [0 1.000e+00 1.000e+00 8.349e-12]  [158 -255 50]
 NS 1-2  2.038e+00  [-57 1.372e-03 1.372e-03 1.530e-11]  [253 -106 75]
 NS 1-3  2.574e+00  [-125 5.068e-07 5.068e-07 1.716e-05]  [117 -110 47]
 NS 1-4  2.853e+00  [-197 1.300e-10 1.300e-10 6.859e-02]  [7 -27 1]
 NS 1-5  3.138e+00  [-220 8.919e-12 8.919e-12 1.000e+00]  !0 0 0!
 NS 2-1  4.367e+00  [0 1.000e+00 2.856e+01 9.945e-13]  [283 -129 79]
 NS 2-2  4.960e+00  [-75 1.586e-04 4.529e-03 1.945e-09]  [143 -201 55]
 NS 2-3  5.593e+00  [-146 4.871e-08 1.391e-06 6.474e-06]  [123 -119 48]
 NS 2-4  5.934e+00  [-216 1.483e-11 4.234e-10 1.795e-02]  [11 -40 2]
 NS 2-5s 6.297e+00  [-251 2.663e-13 7.606e-12 1.000e+00]  !0 0 0!

real    0m7.003s
user    0m15.438s
sys     0m7.609s

Performance of Original Array.h class:

---------------------------------------------------------------------
 Eq     N-i     T       dB  Ri/R1   Ri/R0    R/Ri     lsIt   dB  %t
---------------------------------------------------------------------
 NS 1-1  8.900e-01  [0 1.000e+00 1.000e+00 8.349e-12]  [158 -255 51]
 NS 1-2  2.344e+00  [-57 1.372e-03 1.372e-03 1.530e-11]  [253 -106 74]
 NS 1-3  2.936e+00  [-125 5.068e-07 5.068e-07 1.716e-05]  [117 -110 45]
 NS 1-4  3.262e+00  [-197 1.300e-10 1.300e-10 6.859e-02]  [7 -27 2]
 NS 1-5  3.597e+00  [-220 8.919e-12 8.919e-12 1.000e+00]  !0 0 0!
 NS 2-1  4.947e+00  [0 1.000e+00 2.856e+01 9.945e-13]  [283 -129 76]
 NS 2-2  5.670e+00  [-75 1.586e-04 4.529e-03 1.945e-09]  [143 -201 56]
 NS 2-3  6.275e+00  [-146 4.871e-08 1.391e-06 6.474e-06]  [123 -119 47]
 NS 2-4  6.607e+00  [-216 1.483e-11 4.234e-10 1.795e-02]  [11 -40 2]
 NS 2-5s 6.965e+00  [-251 2.663e-13 7.606e-12 1.000e+00]  !0 0 0!

real    0m7.762s
user    0m16.813s
sys     0m7.828s

For the initial test case we use the fluid/pipe_RCR_3d with 1 processor.

@ktbolt
Copy link
Collaborator

ktbolt commented Dec 5, 2024

@zasexton Very nice!

Any idea about memory usage ?

@zasexton
Copy link
Contributor Author

zasexton commented Dec 5, 2024

The full memory reports from valgrind are attached to this comment but overall because we are only using the Eigen::Map as wrappers to the raw pointers that are managed by the custom arrays and vectors we do not expect to incur much additional memory allocation from this approach to including Eigen. Again this memory analysis was performed on the test case fluid/pipe_RCR_3d with a single processor. `The heap summaries are highlighted below:

Original:

==716== HEAP SUMMARY:
==716==     in use at exit: 7,064,927 bytes in 1,035 blocks
==716==   total heap usage: 8,928,654 allocs, 8,927,619 frees, 1,520,588,974 bytes allocated

Eigen Backend

==30014== HEAP SUMMARY:
==30014==     in use at exit: 7,064,929 bytes in 1,035 blocks
==30014==   total heap usage: 8,928,654 allocs, 8,927,619 frees, 1,520,589,044 bytes allocated

From this report we notice comparable memory allocation. Full reports:
valgrind-eigen-output.txt
valgrind-non-eigen-output.txt

@zasexton
Copy link
Contributor Author

zasexton commented Dec 9, 2024

Initial performance testing of the mat_fun.cpp linear algebra operations with the optional Eigen backend yield the following preliminary results:

Original Backend Implementation:
------------------------------------------------
[==========] Running 22 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 22 tests from MatFunAllOpsPerfTest
[ RUN      ] MatFunAllOpsPerfTest.MatDdotPerformance
[mat_ddot] Eigen: 0 Time: 0.0001959 sec
[       OK ] MatFunAllOpsPerfTest.MatDdotPerformance (4 ms)
[ RUN      ] MatFunAllOpsPerfTest.MatDetPerformance
[mat_det] Eigen: 0 Time: 0.245838 sec
[       OK ] MatFunAllOpsPerfTest.MatDetPerformance (246 ms)
[ RUN      ] MatFunAllOpsPerfTest.MatDevPerformance
[mat_dev] Eigen: 0 Time: 0.0038307 sec
[       OK ] MatFunAllOpsPerfTest.MatDevPerformance (6 ms)
[ RUN      ] MatFunAllOpsPerfTest.MatDyadProdPerformance
[mat_dyad_prod] Eigen: 0 Time: 0.0001899 sec
[       OK ] MatFunAllOpsPerfTest.MatDyadProdPerformance (1 ms)
[ RUN      ] MatFunAllOpsPerfTest.MatIdPerformance
[mat_id] Eigen: 0 Time: 0.0030137 sec
[       OK ] MatFunAllOpsPerfTest.MatIdPerformance (6 ms)
[ RUN      ] MatFunAllOpsPerfTest.MatInvPerformance
[mat_inv] Eigen: 0 Time: 0.434947 sec
[       OK ] MatFunAllOpsPerfTest.MatInvPerformance (436 ms)
[ RUN      ] MatFunAllOpsPerfTest.MatMulMatrixPerformance
[mat_mul(Matrix*Matrix)] Eigen: 0 Time: 0.0539665 sec
[       OK ] MatFunAllOpsPerfTest.MatMulMatrixPerformance (55 ms)
[ RUN      ] MatFunAllOpsPerfTest.MatMulMatrixVectorPerformance
[mat_mul(Matrix*Vector)] Eigen: 0 Time: 0.0272071 sec
[       OK ] MatFunAllOpsPerfTest.MatMulMatrixVectorPerformance (132 ms)
[ RUN      ] MatFunAllOpsPerfTest.MatSymmPerformance
[mat_symm] Eigen: 0 Time: 0.0002937 sec
[       OK ] MatFunAllOpsPerfTest.MatSymmPerformance (2 ms)
[ RUN      ] MatFunAllOpsPerfTest.MatTracePerformance
[mat_trace] Eigen: 0 Time: 2.8e-06 sec
[       OK ] MatFunAllOpsPerfTest.MatTracePerformance (1 ms)
[ RUN      ] MatFunAllOpsPerfTest.TransposePerformance
[transpose] Eigen: 0 Time: 0.0292776 sec
[       OK ] MatFunAllOpsPerfTest.TransposePerformance (38 ms)
[ RUN      ] MatFunAllOpsPerfTest.TenInitPerformance
[ten_init] Eigen: 0 Time: 3.53e-05 sec
[       OK ] MatFunAllOpsPerfTest.TenInitPerformance (1 ms)
[ RUN      ] MatFunAllOpsPerfTest.TenAsymProd12Performance
[ten_asym_prod12] Eigen: 0 Time: 3.2e-06 sec
[       OK ] MatFunAllOpsPerfTest.TenAsymProd12Performance (0 ms)
[ RUN      ] MatFunAllOpsPerfTest.TenDdotPerformance
[ten_ddot] Eigen: 0 Time: 3.5e-06 sec
[       OK ] MatFunAllOpsPerfTest.TenDdotPerformance (0 ms)
[ RUN      ] MatFunAllOpsPerfTest.TenDdot2412Performance
[ten_ddot_2412] Eigen: 0 Time: 2.4e-06 sec
[       OK ] MatFunAllOpsPerfTest.TenDdot2412Performance (0 ms)
[ RUN      ] MatFunAllOpsPerfTest.TenDdot3424Performance
[ten_ddot_3424] Eigen: 0 Time: 2.6e-06 sec
[       OK ] MatFunAllOpsPerfTest.TenDdot3424Performance (0 ms)
[ RUN      ] MatFunAllOpsPerfTest.TenDyadProdPerformance
[ten_dyad_prod] Eigen: 0 Time: 1.9e-06 sec
[       OK ] MatFunAllOpsPerfTest.TenDyadProdPerformance (6 ms)
[ RUN      ] MatFunAllOpsPerfTest.TenIdsPerformance
[ten_ids] Eigen: 0 Time: 1.9e-06 sec
[       OK ] MatFunAllOpsPerfTest.TenIdsPerformance (1 ms)
[ RUN      ] MatFunAllOpsPerfTest.TenMddotPerformance
[ten_mddot] Eigen: 0 Time: 1.5e-06 sec
[       OK ] MatFunAllOpsPerfTest.TenMddotPerformance (0 ms)
[ RUN      ] MatFunAllOpsPerfTest.TenSymmProdPerformance
[ten_symm_prod] Eigen: 0 Time: 2.5e-06 sec
[       OK ] MatFunAllOpsPerfTest.TenSymmProdPerformance (5 ms)
[ RUN      ] MatFunAllOpsPerfTest.TenTransposePerformance
[ten_transpose] Eigen: 0 Time: 1.9e-06 sec
[       OK ] MatFunAllOpsPerfTest.TenTransposePerformance (0 ms)
[ RUN      ] MatFunAllOpsPerfTest.MatMul6x3Performance
[mat_mul6x3] Eigen: 0 Time: 1.3e-06 sec
[       OK ] MatFunAllOpsPerfTest.MatMul6x3Performance (0 ms)
[----------] 22 tests from MatFunAllOpsPerfTest (1007 ms total)

[----------] Global test environment tear-down
[==========] 22 tests from 1 test suite ran. (1011 ms total)
[  PASSED  ] 22 tests.


------------------------------------------------
Eigen Backend Implementation:
------------------------------------------------
[==========] Running 22 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 22 tests from MatFunAllOpsPerfTest
[ RUN      ] MatFunAllOpsPerfTest.MatDdotPerformance
[mat_ddot] Eigen: 1 Time: 0.0001006 sec
[       OK ] MatFunAllOpsPerfTest.MatDdotPerformance (4 ms)
[ RUN      ] MatFunAllOpsPerfTest.MatDetPerformance
[mat_det] Eigen: 1 Time: 7.3e-05 sec
[       OK ] MatFunAllOpsPerfTest.MatDetPerformance (1 ms)
[ RUN      ] MatFunAllOpsPerfTest.MatDevPerformance
[mat_dev] Eigen: 1 Time: 0.0021213 sec
[       OK ] MatFunAllOpsPerfTest.MatDevPerformance (6 ms)
[ RUN      ] MatFunAllOpsPerfTest.MatDyadProdPerformance
[mat_dyad_prod] Eigen: 1 Time: 0.0018783 sec
[       OK ] MatFunAllOpsPerfTest.MatDyadProdPerformance (5 ms)
[ RUN      ] MatFunAllOpsPerfTest.MatIdPerformance
[mat_id] Eigen: 1 Time: 0.0054104 sec
[       OK ] MatFunAllOpsPerfTest.MatIdPerformance (12 ms)
[ RUN      ] MatFunAllOpsPerfTest.MatInvPerformance
[mat_inv] Eigen: 1 Time: 0.0201402 sec
[       OK ] MatFunAllOpsPerfTest.MatInvPerformance (23 ms)
[ RUN      ] MatFunAllOpsPerfTest.MatMulMatrixPerformance
[mat_mul(Matrix*Matrix)] Eigen: 1 Time: 0.0133074 sec
[       OK ] MatFunAllOpsPerfTest.MatMulMatrixPerformance (15 ms)
[ RUN      ] MatFunAllOpsPerfTest.MatMulMatrixVectorPerformance
[mat_mul(Matrix*Vector)] Eigen: 1 Time: 0.0024791 sec
[       OK ] MatFunAllOpsPerfTest.MatMulMatrixVectorPerformance (88 ms)
[ RUN      ] MatFunAllOpsPerfTest.MatSymmPerformance
[mat_symm] Eigen: 1 Time: 0.0001658 sec
[       OK ] MatFunAllOpsPerfTest.MatSymmPerformance (1 ms)
[ RUN      ] MatFunAllOpsPerfTest.MatTracePerformance
[mat_trace] Eigen: 1 Time: 2.5e-06 sec
[       OK ] MatFunAllOpsPerfTest.MatTracePerformance (0 ms)
[ RUN      ] MatFunAllOpsPerfTest.TransposePerformance
[transpose] Eigen: 1 Time: 0.025343 sec
[       OK ] MatFunAllOpsPerfTest.TransposePerformance (34 ms)
[ RUN      ] MatFunAllOpsPerfTest.TenInitPerformance
[ten_init] Eigen: 1 Time: 5.63e-05 sec
[       OK ] MatFunAllOpsPerfTest.TenInitPerformance (0 ms)
[ RUN      ] MatFunAllOpsPerfTest.TenAsymProd12Performance
[ten_asym_prod12] Eigen: 1 Time: 7.82e-05 sec
[       OK ] MatFunAllOpsPerfTest.TenAsymProd12Performance (1 ms)
[ RUN      ] MatFunAllOpsPerfTest.TenDdotPerformance
[ten_ddot] Eigen: 1 Time: 9.23e-05 sec
[       OK ] MatFunAllOpsPerfTest.TenDdotPerformance (0 ms)
[ RUN      ] MatFunAllOpsPerfTest.TenDdot2412Performance
[ten_ddot_2412] Eigen: 1 Time: 1.23e-05 sec
[       OK ] MatFunAllOpsPerfTest.TenDdot2412Performance (0 ms)
[ RUN      ] MatFunAllOpsPerfTest.TenDdot3424Performance
[ten_ddot_3424] Eigen: 1 Time: 3.8e-06 sec
[       OK ] MatFunAllOpsPerfTest.TenDdot3424Performance (0 ms)
[ RUN      ] MatFunAllOpsPerfTest.TenDyadProdPerformance
[ten_dyad_prod] Eigen: 1 Time: 7.2e-06 sec
[       OK ] MatFunAllOpsPerfTest.TenDyadProdPerformance (0 ms)
[ RUN      ] MatFunAllOpsPerfTest.TenIdsPerformance
[ten_ids] Eigen: 1 Time: 3.92e-05 sec
[       OK ] MatFunAllOpsPerfTest.TenIdsPerformance (1 ms)
[ RUN      ] MatFunAllOpsPerfTest.TenMddotPerformance
[ten_mddot] Eigen: 1 Time: 3.6e-05 sec
[       OK ] MatFunAllOpsPerfTest.TenMddotPerformance (0 ms)
[ RUN      ] MatFunAllOpsPerfTest.TenSymmProdPerformance
[ten_symm_prod] Eigen: 1 Time: 2.58e-05 sec
[       OK ] MatFunAllOpsPerfTest.TenSymmProdPerformance (4 ms)
[ RUN      ] MatFunAllOpsPerfTest.TenTransposePerformance
[ten_transpose] Eigen: 1 Time: 1.67e-05 sec
[       OK ] MatFunAllOpsPerfTest.TenTransposePerformance (0 ms)
[ RUN      ] MatFunAllOpsPerfTest.MatMul6x3Performance
[mat_mul6x3] Eigen: 1 Time: 2.1e-06 sec
[       OK ] MatFunAllOpsPerfTest.MatMul6x3Performance (5 ms)
[----------] 22 tests from MatFunAllOpsPerfTest (291 ms total)

[----------] Global test environment tear-down
[==========] 22 tests from 1 test suite ran. (299 ms total)
[  PASSED  ] 22 tests.

@zasexton
Copy link
Contributor Author

zasexton commented Dec 9, 2024

We see that the Eigen backend outperforms the original implementation for mat_det, mat_mul(Matrix*Matrix), mat_mul(Matrix*Vector), and mat_inv. Tensor evaluations perform at comparable speeds but it is important to note that the nominal size of the tensors tested was nd=3

@zasexton
Copy link
Contributor Author

zasexton commented Dec 9, 2024

To thoroughly test accuracy, memory, and speed performance of the linear algebra files I will be testing on this selection of files from the larger svMultiPhyics repo. This allows me to easily add unit tests and speed evaluations (some test cases have already been included here). Once I am convinced that the linear algebra library created here is sufficiently performant I will seek t reintegrate and test with the rest of the sv infrastructure. But for now I require a reduced set of code to develop on.

svMultiPhysics_LinAlg.zip

@zasexton
Copy link
Contributor Author

Tentative speed testing between mat_fun.cpp with optional Eigen backend versus mat_fun_carray.cpp:

mat_fun.cpp

[==========] Running 22 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 22 tests from MatFunAllOpsPerfTest
[ RUN      ] MatFunAllOpsPerfTest.MatDdotPerformance
[mat_ddot] Eigen: 1 nd=300 Time: 8.93e-05 sec
[       OK ] MatFunAllOpsPerfTest.MatDdotPerformance (6 ms)
[ RUN      ] MatFunAllOpsPerfTest.MatDetPerformance
[mat_det] Eigen: 1 nd=3 Time: 5.7e-05 sec
[       OK ] MatFunAllOpsPerfTest.MatDetPerformance (0 ms)
[ RUN      ] MatFunAllOpsPerfTest.MatDevPerformance
[mat_dev] Eigen: 1 nd=300 Time: 0.002081 sec
[       OK ] MatFunAllOpsPerfTest.MatDevPerformance (7 ms)
[ RUN      ] MatFunAllOpsPerfTest.MatDyadProdPerformance
[mat_dyad_prod] Eigen: 1 nd=300 Time: 0.0039048 sec
[       OK ] MatFunAllOpsPerfTest.MatDyadProdPerformance (4 ms)
[ RUN      ] MatFunAllOpsPerfTest.MatIdPerformance
[mat_id] Eigen: 1 nd=300 Time: 8.14e-05 sec
[       OK ] MatFunAllOpsPerfTest.MatIdPerformance (2 ms)
[ RUN      ] MatFunAllOpsPerfTest.MatInvPerformance
[mat_inv] Eigen: 1 nd=300 Time: 0.0133745 sec
[       OK ] MatFunAllOpsPerfTest.MatInvPerformance (16 ms)
[ RUN      ] MatFunAllOpsPerfTest.MatMulMatrixPerformance
[mat_mul(Matrix*Matrix)] Eigen: 1 nd=300 Time: 0.0097094 sec
[       OK ] MatFunAllOpsPerfTest.MatMulMatrixPerformance (14 ms)
[ RUN      ] MatFunAllOpsPerfTest.MatMulMatrixVectorPerformance
[mat_mul(Matrix*Vector)] Eigen: 1 nd=2000 Time: 0.0025162 sec
[       OK ] MatFunAllOpsPerfTest.MatMulMatrixVectorPerformance (116 ms)
[ RUN      ] MatFunAllOpsPerfTest.MatSymmPerformance
[mat_symm] Eigen: 1 nd=300 Time: 0.0008718 sec
[       OK ] MatFunAllOpsPerfTest.MatSymmPerformance (3 ms)
[ RUN      ] MatFunAllOpsPerfTest.MatTracePerformance
[mat_trace] Eigen: 1 nd=300 Time: 9.6e-06 sec
[       OK ] MatFunAllOpsPerfTest.MatTracePerformance (1 ms)
[ RUN      ] MatFunAllOpsPerfTest.TransposePerformance
[transpose] Eigen: 1 nd=300 Time: 0.0001678 sec
[       OK ] MatFunAllOpsPerfTest.TransposePerformance (2 ms)
[ RUN      ] MatFunAllOpsPerfTest.TenInitPerformance
[ten_init] Eigen: 1 nd=20 Time: 0.0013735 sec
[       OK ] MatFunAllOpsPerfTest.TenInitPerformance (1 ms)
[ RUN      ] MatFunAllOpsPerfTest.TenAsymProd12Performance
[ten_asym_prod12] Eigen: 1 nd=20 Time: 0.00189 sec
[       OK ] MatFunAllOpsPerfTest.TenAsymProd12Performance (3 ms)
[ RUN      ] MatFunAllOpsPerfTest.TenDdotPerformance
[ten_ddot] Eigen: 1 nd=20 Time: 0.0088478 sec
[       OK ] MatFunAllOpsPerfTest.TenDdotPerformance (11 ms)
[ RUN      ] MatFunAllOpsPerfTest.TenDdot2412Performance
[ten_ddot_2412] Eigen: 1 nd=20 Time: 0.0046942 sec
[       OK ] MatFunAllOpsPerfTest.TenDdot2412Performance (6 ms)
[ RUN      ] MatFunAllOpsPerfTest.TenDdot3424Performance
[ten_ddot_3424] Eigen: 1 nd=20 Time: 0.0043354 sec
[       OK ] MatFunAllOpsPerfTest.TenDdot3424Performance (6 ms)
[ RUN      ] MatFunAllOpsPerfTest.TenDyadProdPerformance
[ten_dyad_prod] Eigen: 1 nd=20 Time: 0.0003418 sec
[       OK ] MatFunAllOpsPerfTest.TenDyadProdPerformance (1 ms)
[ RUN      ] MatFunAllOpsPerfTest.TenIdsPerformance
[ten_ids] Eigen: 1 nd=20 Time: 0.000961 sec
[       OK ] MatFunAllOpsPerfTest.TenIdsPerformance (1 ms)
[ RUN      ] MatFunAllOpsPerfTest.TenMddotPerformance
[ten_mddot] Eigen: 1 nd=20 Time: 0.0001634 sec
[       OK ] MatFunAllOpsPerfTest.TenMddotPerformance (1 ms)
[ RUN      ] MatFunAllOpsPerfTest.TenSymmProdPerformance
[ten_symm_prod] Eigen: 1 nd=20 Time: 0.0006506 sec
[       OK ] MatFunAllOpsPerfTest.TenSymmProdPerformance (1 ms)
[ RUN      ] MatFunAllOpsPerfTest.TenTransposePerformance
[ten_transpose] Eigen: 1 nd=20 Time: 0.0004849 sec
[       OK ] MatFunAllOpsPerfTest.TenTransposePerformance (1 ms)
[ RUN      ] MatFunAllOpsPerfTest.MatMul6x3Performance
[mat_mul6x3] Eigen: 1 Time: 5.4e-06 sec
[       OK ] MatFunAllOpsPerfTest.MatMul6x3Performance (1 ms)
[----------] 22 tests from MatFunAllOpsPerfTest (253 ms total)

[----------] Global test environment tear-down
[==========] 22 tests from 1 test suite ran. (257 ms total)
[  PASSED  ] 22 tests.

mat_fun_carray.cpp

[==========] Running 27 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 27 tests from MatFunCarrayPerfAllTest
[ RUN      ] MatFunCarrayPerfAllTest.TenInitPerformance
[TenInitPerformance] Eigen enabled: 1 nd=20 Time: 0.0081152 sec
[       OK ] MatFunCarrayPerfAllTest.TenInitPerformance (8 ms)
[ RUN      ] MatFunCarrayPerfAllTest.MatZeroPerformance2D
[MatZero2DPerformance] N=300 Time: 0.0004938 sec
[       OK ] MatFunCarrayPerfAllTest.MatZeroPerformance2D (0 ms)
[ RUN      ] MatFunCarrayPerfAllTest.MatDetPerformance
[MatDetPerformance] N=3 Time: 3.9e-06 sec
[       OK ] MatFunCarrayPerfAllTest.MatDetPerformance (1 ms)
[ RUN      ] MatFunCarrayPerfAllTest.MatIdPerformance
[MatIdPerformance] N=300 Time: 0.0007043 sec
[       OK ] MatFunCarrayPerfAllTest.MatIdPerformance (0 ms)
[ RUN      ] MatFunCarrayPerfAllTest.TransposePerformance
[TransposePerformance] N=300 Time: 0.0007957 sec
[       OK ] MatFunCarrayPerfAllTest.TransposePerformance (1 ms)
[ RUN      ] MatFunCarrayPerfAllTest.MatMulPerformance
[MatMulPerformance] N=300 Time: 0.128799 sec
[       OK ] MatFunCarrayPerfAllTest.MatMulPerformance (130 ms)
[ RUN      ] MatFunCarrayPerfAllTest.MatMulVectorPerformance
[MatMulVectorPerformance] N=2000 Time: 0.0194144 sec
[       OK ] MatFunCarrayPerfAllTest.MatMulVectorPerformance (34 ms)
[ RUN      ] MatFunCarrayPerfAllTest.MatMulVectorArrPerformance
[MatMulVectorArrPerformance] N=2000 Time: 0.0127597 sec
[       OK ] MatFunCarrayPerfAllTest.MatMulVectorArrPerformance (28 ms)
[ RUN      ] MatFunCarrayPerfAllTest.MatMul6x3Performance
[MatMul6x3Performance] dimension=12x12 and 12x3 Time: 4.4e-06 sec
[       OK ] MatFunCarrayPerfAllTest.MatMul6x3Performance (0 ms)
[ RUN      ] MatFunCarrayPerfAllTest.TenMatDdotPerformance
[TenMatDdotPerformance] nd=20 Time: 0.0007067 sec
[       OK ] MatFunCarrayPerfAllTest.TenMatDdotPerformance (4 ms)
[ RUN      ] MatFunCarrayPerfAllTest.MatInvPerformance
[MatInvPerformance] N=300 Time: 1.2e-06 sec
[       OK ] MatFunCarrayPerfAllTest.MatInvPerformance (5 ms)
[ RUN      ] MatFunCarrayPerfAllTest.MatTracePerformance
[MatTracePerformance] N=300 Time: 2.8e-06 sec
[       OK ] MatFunCarrayPerfAllTest.MatTracePerformance (1 ms)
[ RUN      ] MatFunCarrayPerfAllTest.TenZeroPerformance
[TenZeroPerformance] nd=20 Time: 0.0003982 sec
[       OK ] MatFunCarrayPerfAllTest.TenZeroPerformance (0 ms)
[ RUN      ] MatFunCarrayPerfAllTest.TenDyadProdPerformance
[TenDyadProdPerformance(Array)] nd=20 Time: 0.0053323 sec
[       OK ] MatFunCarrayPerfAllTest.TenDyadProdPerformance (6 ms)
[ RUN      ] MatFunCarrayPerfAllTest.TenIdsPerformance
[TenIdsPerformance] nd=20 Time: 0.0003936 sec
[       OK ] MatFunCarrayPerfAllTest.TenIdsPerformance (1 ms)
[ RUN      ] MatFunCarrayPerfAllTest.MatDyadProdPerformance
[MatDyadProdPerformance] N=300 Time: 0.0010379 sec
[       OK ] MatFunCarrayPerfAllTest.MatDyadProdPerformance (1 ms)
[ RUN      ] MatFunCarrayPerfAllTest.MatSymmProdPerformance
[MatSymmProdPerformance] N=300 Time: 0.0013694 sec
[       OK ] MatFunCarrayPerfAllTest.MatSymmProdPerformance (3 ms)
[ RUN      ] MatFunCarrayPerfAllTest.TenSymmProdPerformance
[TenSymmProdPerformance] nd=20 Time: 0.0046167 sec
[       OK ] MatFunCarrayPerfAllTest.TenSymmProdPerformance (6 ms)
[ RUN      ] MatFunCarrayPerfAllTest.MatDdotPerformance
[MatDdotPerformance] N=300 Time: 0.0005766 sec
[       OK ] MatFunCarrayPerfAllTest.MatDdotPerformance (3 ms)
[ RUN      ] MatFunCarrayPerfAllTest.TenTransposePerformance
[TenTransposePerformance] nd=20 Time: 0.0040224 sec
[       OK ] MatFunCarrayPerfAllTest.TenTransposePerformance (7 ms)
[ RUN      ] MatFunCarrayPerfAllTest.TenDdotFullPerformance
[TenDdotFullPerformance] nd=20 Time: 0.0097928 sec
[       OK ] MatFunCarrayPerfAllTest.TenDdotFullPerformance (17 ms)
[ RUN      ] MatFunCarrayPerfAllTest.NormPerformance
[NormPerformance(arr)] N=300 Time: 2e-06 sec
[       OK ] MatFunCarrayPerfAllTest.NormPerformance (1 ms)
[ RUN      ] MatFunCarrayPerfAllTest.NormPerformanceVector
[NormPerformance(Vector,arr)] N=300 Time: 2.3e-06 sec
[       OK ] MatFunCarrayPerfAllTest.NormPerformanceVector (0 ms)
[ RUN      ] MatFunCarrayPerfAllTest.MatSymmPerformance
[MatSymmPerformance] N=300 Time: 0.0005573 sec
[       OK ] MatFunCarrayPerfAllTest.MatSymmPerformance (3 ms)
[ RUN      ] MatFunCarrayPerfAllTest.MatDevPerformance
[MatDevPerformance] N=300 Time: 0.001858 sec
[       OK ] MatFunCarrayPerfAllTest.MatDevPerformance (3 ms)
[ RUN      ] MatFunCarrayPerfAllTest.TenDdot3424Performance
[TenDdot3424Performance] nd=20 Time: 0.0095896 sec
[       OK ] MatFunCarrayPerfAllTest.TenDdot3424Performance (16 ms)
[ RUN      ] MatFunCarrayPerfAllTest.TenDdot2412Performance
[TenDdot2412Performance] nd=20 Time: 0.0089012 sec
[       OK ] MatFunCarrayPerfAllTest.TenDdot2412Performance (16 ms)
[----------] 27 tests from MatFunCarrayPerfAllTest (399 ms total)

[----------] Global test environment tear-down
[==========] 27 tests from 1 test suite ran. (405 ms total)
[  PASSED  ] 27 tests.

we notice that the optional Eigen backend can lead to improvements over native carrays for operations in which compilers cannot readily auto-vectorize the for loops. Still we need to more rigorously test these functions to ensure that speed performance metrics are reliable. @aabrown100-git this might be of interest toward some recent comments in the material modeling

@zasexton
Copy link
Contributor Author

zasexton commented Dec 10, 2024

as a general note, for optimally compiled code with Eigen make sure to check that the compiler is using optimization -O3 and that it leverages the current machine's CPU architecture -march=native otherwise Eigen will not be guaranteed to use the hardware accelerators (e.g. sse4, avx512 etc.) available on the current machine.

@aabrown100-git
Copy link
Collaborator

aabrown100-git commented Dec 11, 2024

@zasexton Nice these results are interesting! It looks like what I've been doing with Eigen in the materials model here might be better replaced by what you're doing here.

One question: Could you run these tests on small matrix and tensor sizes (e.g. nd = 3), which are relevant for the material models. I saw some information online suggesting Eigen Tensors may be nonperformant for smaller sizes, and in my preliminary tests I found my Eigen Tensor operations in the material models to be significantly slower than C-arrays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants