Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: avoid resetting method.hashUpdate #95

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

mikolajlubiak
Copy link

This is done to avoid setting the variable twice, because in a multithreaded environment this destroys memory caches. Note that to make the code cleaner I used an early return in the special sha256_ni_hashUpdate case.

Compilation using the Intel® oneAPI DPC++/C++ compiler succeeded.
Code formatted according to the .clang-format rules.
Added a note in change log and incremented the version.

Fixes intel/linux-sgx#1073

This is done to avoid setting the variable twice, because in a
multithreaded environment this destroys memory caches.
Note that to make the code cleaner I used an early return in the special `sha256_ni_hashUpdate` case.

Fixes intel/linux-sgx#1073

Signed-off-by: Mikołaj Lubiak <[email protected]>
@mikolajlubiak
Copy link
Author

To cite the issue:

Since this structure is static and shared across all threads, calling this method from different threads causes the function pointer to keep changing for all threads involved with devastating consequences for the memory caches (on the pure ippcp sample without using SGX, we could see a massive memory bottleneck due to L1D and L3 cache misses using perf).

@mikolajlubiak
Copy link
Author

mikolajlubiak commented Mar 4, 2025

Did a couple of benchmark runs, and the new code saves about 1,000,000,000 cache references, 10 CPU seconds (10% decrease), and halves the instruction count, in the specific benchmark case. Numbers hold roughly the same proportions in all runs.
With fix:

❯ sudo perf stat -B -e cache-references,cache-misses,cycles,instructions,branches,faults,migrations ./benchmark 12 10000000
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 5652985402 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 5654593436 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6133624034 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6137065569 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6217676274 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6218518712 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6277306025 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6278381844 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6356617497 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6360215651 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6448754165 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6455433524 ns

 Performance counter stats for './benchmark 12 10000000':

     2,539,162,050      cache-references
            97,644      cache-misses                     #    0.00% of all cache refs
   303,215,955,461      cycles
    56,939,106,874      instructions                     #    0.19  insn per cycle
     5,594,461,691      branches
               162      faults
                 9      migrations

       6.457206156 seconds time elapsed

      73.972820000 seconds user
       0.005963000 seconds sys

Without fix:

❯ sudo perf stat -B -e cache-references,cache-misses,cycles,instructions,branches,faults,migrations ./benchmark 12 10000000
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6324792106 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6328368805 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6948123032 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6972823042 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6989185139 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 7003213614 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 7044943411 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 7054015430 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 7101390258 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 7104537262 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 7106029344 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 7121662468 ns

 Performance counter stats for './benchmark 12 10000000':

     3,727,422,918      cache-references
           120,280      cache-misses                     #    0.00% of all cache refs
   339,299,104,432      cycles
   110,108,473,947      instructions                     #    0.32  insn per cycle
     5,839,178,741      branches
               164      faults
                10      migrations

       7.124216913 seconds time elapsed

      82.828447000 seconds user
       0.005983000 seconds sys

Benchmark code: https://gist.github.com/mikolajlubiak/4d4c2f3286c03455f0aca256300aa3fe (from haxelion)

@mikolajlubiak
Copy link
Author

Some other stats with the same benchmark:

With fix:

❯ sudo perf stat -B -e LLC-loads,LLC-load-misses,LLC-stores,LLC-store-misses ./benchmark 12 10000000
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6149230421 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6153150286 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6195275445 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6204341769 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6204890586 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6208547958 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6266405663 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6269321465 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6269740576 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6274113901 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6402036998 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6408649862 ns

 Performance counter stats for './benchmark 12 10000000':

       289,679,436      LLC-loads                                                               (50.00%)
            17,378      LLC-load-misses                  #    0.01% of all LL-cache accesses    (49.99%)
       256,823,565      LLC-stores                                                              (50.01%)
             2,493      LLC-store-misses                                                        (50.01%)

       6.410538773 seconds time elapsed

      74.650541000 seconds user
       0.000993000 seconds sys

Without fix:

❯ sudo perf stat -B -e LLC-loads,LLC-load-misses,LLC-stores,LLC-store-misses ./benchmark 12 10000000
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6249371647 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6254551471 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6717360964 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6739020889 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6950421995 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 6963391824 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 7013288374 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 7060327093 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 7068903938 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 7072578410 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 7175201365 ns
SHA256 result: 6b4ab10d92373474a97d10639e667f6b734af012f9be29997f1befd1c7166199 in 7182094432 ns

 Performance counter stats for './benchmark 12 10000000':

       467,841,050      LLC-loads                                                               (50.00%)
            18,428      LLC-load-misses                  #    0.00% of all LL-cache accesses    (49.99%)
       275,681,401      LLC-stores                                                              (50.01%)
             2,539      LLC-store-misses                                                        (50.01%)

       7.183901829 seconds time elapsed

      82.007108000 seconds user
       0.014439000 seconds sys

@haxelion
Copy link

@mikolajlubiak sorry I was AFK last week.
Thank you very much for looking into it and making this PR!

It is unfortunate that the IPP CP API and the TCrypto API doesn't mesh well and forces us to call the ippsHashMethod_SHA256_TT function each time but the only way around it would be a deeper refactoring of TCrypto.

I think that change can be broadly applied to all ippsHashMethod_SHA*_TT functions since they would all contain the same dynamic switching to the SHA-NI implementation.

@mikolajlubiak
Copy link
Author

It is unfortunate that the IPP CP API and the TCrypto API doesn't mesh well and forces us to call the ippsHashMethod_SHA256_TT function each time but the only way around it would be a deeper refactoring of TCrypto.

Yes, although this PR improves the performance a bit its still really poor.

I think that change can be broadly applied to all ippsHashMethod_SHA*_TT functions since they would all contain the same dynamic switching to the SHA-NI implementation.

You are right. I wanted to do that, but unfortunately had no free time.

@stanciuadrian
Copy link

Thanks @mikolajlubiak and @haxelion for your valuable contribution and feedback.
The performance results are really useful to understand what you see on your side.

We'll check it internally and get back to you with an answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Very Poor Multi-Threaded Performance of lib TCrypto Hashing Functions
3 participants