Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SHA2 reworking and API for iterating over multiple implementations #13741

Closed
wants to merge 8 commits into from

Conversation

mcmilk
Copy link
Contributor

@mcmilk mcmilk commented Aug 5, 2022

This RFC is about replacing the solaris SHA2 implementation with a public domain variant of SHA2.
Also a new generic implementation API is introduced and could be used for other algorithms like AES, Fletcher and so on...


Here are some benchmarks, which are done while loading the zfs module. By default the fastest impl. will be used.

Linux debian-x64 5.10.0-13-amd64 - VM on AMD Ryzen 5800X
implementation               1k      4k     16k     64k    256k      1m      4m     16m
edonr-generic              1940    2517    2709    2762    2761    2772    2793    2777
skein-generic               894     988    1015    1022    1030    1024    1027    1026

sha256-generic              319     342     349     350     349     350     351     351
sha256-current-openssl      421     453     458     466     466     466     467     467
sha256-x64                  441     474     480     480     484     484     484     485
sha256-ssse3                497     590     620     626     630     629     627     627
sha256-avx                  277     311     323     325     326     326     325     326
sha256-avx2                 541     640     679     688     687     684     687     686
sha256-shani               1314    1915    2184    2260    2276    2282    2280    2283

sha512-generic              455     516     534     537     534     535     534     533
sha512-current-openssl      617     697     729     732     736     737     737     736
sha512-x64                  643     723     758     763     763     762     763     765
sha512-avx                  387     463     487     497     498     497     498     498
sha512-avx2                 738     975    1058    1076    1084    1079    1083    1082

blake3-generic              313     320     315     317     317     317     317     314
blake3-sse2                 348    1448    1646    2040    1920    2010    1820    1751
blake3-sse41                304    1807    2053    2165    2211    2248    2303    2494
blake3-avx2                 384    2783    4617    5036    5013    4925    4860    4871

Linux debian-x64 5.10.0-13-amd64 - VM on Intel i3-1005G1
implementation               1k      4k     16k     64k    256k      1m      4m     16m
edonr-generic              1304    1649    1733    1754    1786    1774    1752    1741
skein-generic               552     603     615     616     617     620     617     613
sha256-generic              254     274     279     280     280     281     280     282
sha256-x64                  289     310     317     318     318     317     318     317
sha256-ssse3                307     342     353     356     357     357     357     356
sha256-avx                  312     347     360     362     361     362     362     362
sha256-avx2                 332     378     392     394     396     396     394     396
sha256-shani                881    1113    1209    1222    1237    1236    1223    1228
sha512-generic              363     412     428     430     424     423     423     424
sha512-x64                  421     474     493     494     493     496     495     495
sha512-avx                  430     523     551     559     560     559     555     557
sha512-avx2                 455     567     599     606     609     608     605     609
blake3-generic              329     323     323     323     325     283     283     314
blake3-sse2                 384    1343    1426    1446    1452    1433    1389    1395
blake3-sse41                412    1539    1647    1693    1693    1660    1602    1616
blake3-avx2                 409    1956    3148    3328    3301    3171    3008    3018
blake3-avx512               446    2792    5117    5815    5777    5539    4911    4927

AArch64 VM on my Mac Mini M1:
implementation               1k      4k     16k     64k    256k      1m      4m     16m
edonr-generic              2707    3044    3372    3397    3427    3347    3366    3385
skein-generic               556     581     589     592     595     591     589     589
sha256-generic              286     303     305     308     307     307     307     305
sha256-neon                 358     370     379     380     381     381     380     380
sha256-armv8-ce            2164    2310    2344    2336    2304    2343    2338    2340
sha512-generic              431     477     485     489     489     490     476     490
sha512-armv8-ce            1198    1327    1368    1363    1372    1369    1335    1368
blake3-generic              597     592     594     584     585     589     588     588
blake3-sse2                 435    1422    1507    1558    1558    1537    1446    1516
blake3-sse41                464    1518    1643    1663    1657    1650    1609    1610

TODO

The PR #13649 can use additional hardware acceleration via FreeBSD drivers, it should also go into OpenZFS, but not within this pull request.

Description

The skeleton file module/icp/include/generic_impl.c can be used for iterating over different implementations of algorithms. It is used by SHA256, SHA512 and BLAKE3 currently.

The Solaris SHA2 implementation got replaced with public domain code of [ccpcrypto 0.10 (https://sourceforge.net/projects/cppcrypto/files/cppcrypto-0.10-src.zip/download).

These assembly files are taken from current openssl master:

  • sha256-x86_64.S: x64, SSSE3, AVX, AVX2, SHA-NI (x86_64)
  • sha512-x86_64.S: x64, AVX, AVX2 (x86_64)
  • sha256-armv7.S: ARMv7, NEON, ARMv8-CE (arm)
  • sha512-armv7.S: ARMv7, NEON (arm)
  • sha256-armv8.S: ARMv7, NEON, ARMv8-CE (aarch64)
  • sha512-armv8.S: ARMv7, ARMv8-CE (aarch64)
  • sha256-ppc.S: Generic PPC64 LE/BE (ppc64)
  • sha512-ppc.S: Generic PPC64 LE/BE (ppc64)
  • sha256-p8.S: Power8 ISA Version 2.07 LE/BE (ppc64)
  • sha512-p8.S: Power8 ISA Version 2.07 LE/BE (ppc64)

Signed-off-by: Tino Reichardt [email protected]

How Has This Been Tested?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@mcmilk mcmilk mentioned this pull request Aug 5, 2022
13 tasks
@mcmilk mcmilk force-pushed the sha2-rework branch 4 times, most recently from 10c2baa to 97c0f4c Compare August 5, 2022 20:29
@rincebrain
Copy link
Contributor

I'll give these a spin on my various sundry machines later today or tomorrow.

@rincebrain
Copy link
Contributor

rincebrain commented Aug 7, 2022

Of course, it's important to include the silliest part, the example benchmarks.

Ryzen 5900X:

$ uname -r
5.18.0-0.bpo.1-amd64
$ cat /proc/spl/kstat/zfs/chksum_bench
15 0 0x01 -1 0 242790676536984 242792205359807
implementation               1k      4k     16k     64k    256k      1m      4m     16m
edonr-generic               973    1196    1239    1285    1280    1280    1280    1278
skein-generic               419     458     468     471     472     471     471     470
sha256-generic              192     206     209     210     211     211     211     211
sha256-ssse3                260     280     286     287     286     287     288     287
sha256-avx                  137     145     147     148     148     148     148     148
sha256-avx2                 284     305     312     313     313     313     313     313
sha256-shani                912    1002    1030    1037    1038    1038    1038    1038
sha512-generic              272     308     318     321     322     322     322     321
sha512-avx                  195     217     224     226     226     226     226     226
sha512-avx2                 412     470     489     494     496     495     495     494
blake3-generic              284     288     289     288     288     288     288     288
blake3-sse2                 165     930    1010    1034    1030    1016    1017    1017
blake3-sse41                173    1029    1138    1171    1169    1152    1153    1151
blake3-avx2                 172    1278    2099    2287    2268    2210    2223    2215

AArch64 VM on my Mac Mini M1:

$ uname -r
5.10.0-16-arm64
$ cat /proc/spl/kstat/zfs/chksum_bench
15 0 0x01 -1 0 1425811553429 1434441457433
implementation               1k      4k     16k     64k    256k      1m      4m     16m
edonr-generic              2707    3044    3372    3397    3427    3347    3366    3385
skein-generic               556     581     589     592     595     591     589     589
sha256-generic              286     303     305     308     307     307     307     305
sha256-neon                 358     370     379     380     381     381     380     380
sha256-armv8-ce            2164    2310    2344    2336    2304    2343    2338    2340
sha512-generic              431     477     485     489     489     490     476     490
sha512-armv8-ce            1198    1327    1368    1363    1372    1369    1335    1368
blake3-generic              597     592     594     584     585     589     588     588
blake3-sse2                 435    1422    1507    1558    1558    1537    1446    1516
blake3-sse41                464    1518    1643    1663    1657    1650    1609    1610

Sadly, ztest appears to be entirely broken, rather than merely sometimes unstable, on my testbed with this patch merged, and works fine without it:

~/zfs_vanilla_forreal $ ./ztest -VVVVV -GGG
choosing RAID type 'raidz'
5 vdevs, 7 datasets, 23 threads,4 raidz disks, 300 seconds...

Executing /home/rich/zfs_vanilla_forreal/zdb -bccsv -G -d -Y -e -y  -p /tmp ztest
verifying concrete vdev 0, metaslab 14 of 15 ...
Verifying deleted livelist entries
Verifying metaslab entries
Dataset mos [META], ID 0, cr_txg 4, 89.8K, 56 objects
[etc]
~/zfs_newsha $ ./ztest -VVVVV -GGG
choosing RAID type 'raidz'
5 vdevs, 7 datasets, 23 threads,4 raidz disks, 300 seconds...

Executing /home/rich/zfs_newsha/zdb -bccsv -G -d -Y -e -y  -p /tmp ztest
zdb: can't open 'ztest': No such device or address

ZFS_DBGMSG(zdb) START:
spa.c:6133:spa_import(): spa_import: importing ztest
spa_misc.c:416:spa_load_note(): spa_load(ztest, config trusted): LOADING
vdev.c:153:vdev_dbgmsg(): file vdev '/tmp/ztest.2a': vdev_validate: failed reading config for txg 18446744073709551615
vdev.c:153:vdev_dbgmsg(): file vdev '/tmp/ztest.0a': vdev_validate: failed reading config for txg 18446744073709551615
vdev.c:153:vdev_dbgmsg(): file vdev '/tmp/ztest.7a': vdev_validate: failed reading config for txg 18446744073709551615
vdev.c:153:vdev_dbgmsg(): file vdev '/tmp/ztest.4a': vdev_validate: failed reading config for txg 18446744073709551615
vdev.c:153:vdev_dbgmsg(): file vdev '/tmp/ztest.1a': vdev_validate: failed reading config for txg 18446744073709551615
vdev.c:153:vdev_dbgmsg(): file vdev '/tmp/ztest.6a': vdev_validate: failed reading config for txg 18446744073709551615
vdev.c:153:vdev_dbgmsg(): file vdev '/tmp/ztest.3a': vdev_validate: failed reading config for txg 18446744073709551615
vdev.c:153:vdev_dbgmsg(): file vdev '/tmp/ztest.5a': vdev_validate: failed reading config for txg 18446744073709551615
spa_misc.c:402:spa_load_failed(): spa_load(ztest, config untrusted): FAILED: cannot open vdev tree after invalidating some vdevs
vdev.c:205:vdev_dbgmsg_print_tree():   vdev 0: root, guid: 5616588175193895870, path: N/A, can't open
vdev.c:205:vdev_dbgmsg_print_tree():     vdev 0: mirror, guid: 3888918000877878788, path: N/A, can't open
vdev.c:205:vdev_dbgmsg_print_tree():       vdev 0: raidz, guid: 5965506435800278309, path: N/A, can't open
vdev.c:205:vdev_dbgmsg_print_tree():         vdev 0: file, guid: 18212454090720735338, path: /tmp/ztest.0a, can't open
vdev.c:205:vdev_dbgmsg_print_tree():         vdev 1: file, guid: 12178677185200324091, path: /tmp/ztest.1a, can't open
vdev.c:205:vdev_dbgmsg_print_tree():         vdev 2: file, guid: 6958433759899787561, path: /tmp/ztest.2a, can't open
vdev.c:205:vdev_dbgmsg_print_tree():         vdev 3: file, guid: 14252546041931747235, path: /tmp/ztest.3a, can't open
vdev.c:205:vdev_dbgmsg_print_tree():       vdev 1: raidz, guid: 8100377519981327636, path: N/A, can't open
vdev.c:205:vdev_dbgmsg_print_tree():         vdev 0: file, guid: 6727364101487571618, path: /tmp/ztest.4a, can't open
vdev.c:205:vdev_dbgmsg_print_tree():         vdev 1: file, guid: 2466317645343985470, path: /tmp/ztest.5a, can't open
vdev.c:205:vdev_dbgmsg_print_tree():         vdev 2: file, guid: 7668133104849780700, path: /tmp/ztest.6a, can't open
vdev.c:205:vdev_dbgmsg_print_tree():         vdev 3: file, guid: 11291587442423573644, path: /tmp/ztest.7a, can't open
spa_misc.c:416:spa_load_note(): spa_load(ztest, config untrusted): UNLOADING
ZFS_DBGMSG(zdb) END
ztest: '/home/rich/zfs_newsha/zdb -bccsv -G -d -Y -e -y  -p /tmp ztest' exit code 1

ZFS_DBGMSG(ztest) START:
ZFS_DBGMSG(ztest) END
child exited with code 3

So that bodes ill. (This is on my Debian 11 system with a -backports kernel and --enable-debug, but it works/fails the same way on the AArch64 VM too.)

(I'm blaming ztest and not zdb because zdb from the unmodified tree fails the same way on the resulting vdev files.)

edit 2: well, it fails...DIFFERENTLY with --enable-asan...

ZFS_DBGMSG(zdb) START:                                                                                                                                                                               |
spa.c:6133:spa_import(): spa_import: importing ztest                                                                                                                                                 |
spa_misc.c:416:spa_load_note(): spa_load(ztest, config trusted): LOADING                                                                                                                             |
vdev.c:153:vdev_dbgmsg(): file vdev '/tmp/ztest.0a': best uberblock found for spa ztest. txg 4                                                                                                       |
spa_misc.c:416:spa_load_note(): spa_load(ztest, config untrusted): using uberblock with txg=4                                                                                                        |
spa.c:8399:spa_async_request(): spa=ztest async request task=2048                                                                                                                                    |
spa_misc.c:416:spa_load_note(): spa_load(ztest, config trusted): LOADED                                                                                                                              |
spa.c:8399:spa_async_request(): spa=ztest async request task=32                                                                                                                                      |
ZFS_DBGMSG(zdb) END                                                                                                                                                                                  |
testing spa_freeze()...                                                                                                                                                                              |
zio_crypt_key_init(crypt, &dck.dck_key) == 0 (0x5 == 0)                                                                                                                                              |
ASSERT at module/zfs/dsl_crypt.c:2541:dsl_crypto_key_create_sync()/lib/x86_64-linux-gnu/libasan.so.6(+0x43df1)[0x7f07ca243df1]                                                                       |
/home/rich/zfs_newsha/.libs/ztest(+0x27107)[0x5609017ee107]                                                                                                                                          |
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14140)[0x7f07caceb140]                                                                                                                                      |
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x141)[0x7f07c9332ce1]                                                                                                                                       |
/lib/x86_64-linux-gnu/libc.so.6(abort+0x123)[0x7f07c931c537]                                                                                                                                         |
/home/rich/zfs_newsha/.libs/libzpool.so.5(+0x15fecc)[0x7f07c975fecc]                                                                                                                                 |
/home/rich/zfs_newsha/.libs/libzpool.so.5(dsl_crypto_key_create_sync+0x498)[0x7f07c98d5308]                                                                                                          |
/home/rich/zfs_newsha/.libs/libzpool.so.5(dsl_dataset_create_crypt_sync+0x572)[0x7f07c98d5a82]                                                                                                       |
/home/rich/zfs_newsha/.libs/libzpool.so.5(dsl_dataset_create_sync_dd+0xd9e)[0x7f07c98eadde]                                                                                                          |
/home/rich/zfs_newsha/.libs/libzpool.so.5(dsl_dataset_create_sync+0x552)[0x7f07c98ebbd2]                                                                                                             |
/home/rich/zfs_newsha/.libs/libzpool.so.5(+0x26a66f)[0x7f07c986a66f]                                                                                                                                 |
/home/rich/zfs_newsha/.libs/libzpool.so.5(dsl_sync_task_sync+0x204)[0x7f07c9948334]                                                                                                                  |
/home/rich/zfs_newsha/.libs/libzpool.so.5(dsl_pool_sync+0xb10)[0x7f07c9920a00]                                                                                                                       |
/home/rich/zfs_newsha/.libs/libzpool.so.5(spa_sync+0x13c4)[0x7f07c99ae5a4]                                                                                                                           |
/home/rich/zfs_newsha/.libs/libzpool.so.5(+0x3ecfcf)[0x7f07c99ecfcf]                                                                                                                                 |
/home/rich/zfs_newsha/.libs/libzpool.so.5(+0x163892)[0x7f07c9763892]                                                                                                                                 |
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7)[0x7f07cacdfea7]                                                                                                                                       |
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f07c93f4def]                                                                                                                                          |
                                                                                                                                                                                                     |
ZFS_DBGMSG(ztest) START:                                                                                                                                                                             |
spa.c:5210:spa_open_common(): spa_open_common: opening ztest                                                                                                                                         |
spa_misc.c:416:spa_load_note(): spa_load(ztest, config trusted): LOADING                                                                                                                             |
vdev.c:153:vdev_dbgmsg(): file vdev '/tmp/ztest.0a': best uberblock found for spa ztest. txg 4                                                                                                       |
spa_misc.c:416:spa_load_note(): spa_load(ztest, config untrusted): using uberblock with txg=4                                                                                                        |
spa_misc.c:416:spa_load_note(): spa_load(ztest, config trusted): read 1 log space maps (1 total blocks - blksz = 131072 bytes) in 0 ms                                                               |
mmp.c:239:mmp_thread_start(): MMP thread started pool 'ztest' gethrtime 248827800761194                                                                                                              |
spa.c:8399:spa_async_request(): spa=ztest async request task=2048                                                                                                                                    |
spa_misc.c:416:spa_load_note(): spa_load(ztest, config trusted): LOADED                                                                                                                              |
spa_history.c:306:spa_history_log_sync(): txg 6 open pool version 5000; software version zfs-2.1.99-1329-g5a57a1f79; uts melusine 5.18.0-0.bpo.1-amd64 #1 SMP PREEMPT_DYNAMIC Debian 5.18.2-1~bpo11+1|
 (2022-06-14) x86_64                                                                                                                                                                                 |
ZFS_DBGMSG(ztest) END                                                                                                                                                                                |
child died with signal 6                                                    ```

@mcmilk
Copy link
Contributor Author

mcmilk commented Aug 8, 2022

I think that zdb and ztest are okay and I am missing sth.... I will take a closer look.
Because this PR replaces the whole SHA2 code, I would be happy if someone from the OpenZFS team could say sth. for or against it... I will invenstigate the failure, it makes sense.

@rincebrain
Copy link
Contributor

rincebrain commented Aug 8, 2022 via email

memset(m + pos, 0, (size_t)(128 - pos));
mlen = cpu_to_be64(ctx->count[1]);
memcpy(m + (128 - 8), &mlen, 64 / 8);
kfpu_end();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scratch that, reverse it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @rincebrain - this branch is just an RFC ... please do not test in deep, it's full of bugs, I need to find in the weekend ;-)
But the EdonR shrinking and the BLAKE3 tuneable fix are ready for a review.

The kfpu_begin()/end() are on my TODO for this branch here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't spend a bunch of time testing it, I just assumed it would at least run enough to try.

I'd advise marking the PR as a draft, and putting a more explicit disclaimer about it being known to not function correctly, if that's the case, otherwise some people who test open PRs if they seem interesting might try it and be distressed.

And the specific thing I was pointing out was that you had reversed kfpu_end and kfpu_begin in that one case, not that you were missing them more generally or should delete them.

I was waiting for there to be a few days when those two PRs didn't get an update to review them, as force-pushing over the entire branch makes it difficult to tell what's changed, and the last two times I started reviewing one of them, they got updated before I finished.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the generic methods the kfpu begin/end was not okay there... I will put it into the transform() function itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was waiting for there to be a few days when those two PRs didn't get an update to review them, as force-pushing over the entire branch makes it difficult to tell what's changed, and the last two times I started reviewing one of them, they got updated before I finished.

I am sorry for this. I rebased the Edon-R branch because some reference for this #13710 was requested by @gmelikov

And the next update to the sha2-rework will be a seperate commit - okay :)

@mcmilk mcmilk marked this pull request as draft August 11, 2022 03:37
@mcmilk mcmilk force-pushed the sha2-rework branch 5 times, most recently from 03dac33 to d36c47b Compare August 22, 2022 15:06
@mcmilk
Copy link
Contributor Author

mcmilk commented Aug 22, 2022

@rincebrain - the patch is ready for real testing now.
I will add armv4 and maybe some x86-64 assembly ... but the main thing is done. The last commit will be spitted to about 4 or 5 commits... until the weekend. I am searching my raspberry pi3 now ^^

Have you got some time for the EdonR thing?

@mcmilk mcmilk marked this pull request as ready for review August 23, 2022 22:24
@mcmilk mcmilk force-pushed the sha2-rework branch 2 times, most recently from 0021ef5 to 138d178 Compare August 24, 2022 05:26
@mcmilk mcmilk mentioned this pull request Sep 7, 2022
13 tasks
@mcmilk mcmilk force-pushed the sha2-rework branch 4 times, most recently from 8e72060 to 1d40c3b Compare September 17, 2022 09:04
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Mar 3, 2023
The skeleton file module/icp/include/generic_impl.c can be used for
iterating over different implementations of algorithms.

It is used by SHA256, SHA512 and BLAKE3 currently.

The Solaris SHA2 implementation got replaced with a version which is
based on public domain code of cppcrypto v0.10.

These assembly files are taken from current openssl master:
- sha256-x86_64.S: x64, SSSE3, AVX, AVX2, SHA-NI (x86_64)
- sha512-x86_64.S: x64, AVX, AVX2 (x86_64)
- sha256-armv7.S: ARMv7, NEON, ARMv8-CE (arm)
- sha512-armv7.S: ARMv7, NEON (arm)
- sha256-armv8.S: ARMv7, NEON, ARMv8-CE (aarch64)
- sha512-armv8.S: ARMv7, ARMv8-CE (aarch64)
- sha256-ppc.S: Generic PPC64 LE/BE (ppc64)
- sha512-ppc.S: Generic PPC64 LE/BE (ppc64)
- sha256-p8.S: Power8 ISA Version 2.07 LE/BE (ppc64)
- sha512-p8.S: Power8 ISA Version 2.07 LE/BE (ppc64)

Tested-by: Rich Ercolani <[email protected]>
Tested-by: Sebastian Gottschall <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tino Reichardt <[email protected]>
Closes openzfs#13741
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Mar 3, 2023
This commit changes the BLAKE3 implementation handling and
also the calls to it from the ztest command.

Tested-by: Rich Ercolani <[email protected]>
Tested-by: Sebastian Gottschall <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tino Reichardt <[email protected]>
Closes openzfs#13741
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Mar 3, 2023
- instead of ".section .rodata" we should use SECTION_STATIC

Tested-by: Rich Ercolani <[email protected]>
Tested-by: Sebastian Gottschall <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tino Reichardt <[email protected]>
Closes openzfs#13741
@mcmilk
Copy link
Contributor Author

mcmilk commented Mar 3, 2023

A bit unfortunate, it blows away the ASMABI work for the calling convention ABI.

Is there really some problem with the SHA2 files of this PR within the Windows Port?
If yes, I will help with fixes, when needed.

I used ENTRY_ALIGN(), SET_SIZE() and SECTION_STATIC ...

@lundman
Copy link
Contributor

lundman commented Mar 3, 2023

Give me a sec, just merging in and making sure its running.. it'll look something like:

openzfsonwindows@2934c88

hrmm i need to pull out the kfpu_vars as a PR - so I can fix up in a separate PR

@mcmilk
Copy link
Contributor Author

mcmilk commented Mar 3, 2023

Give me a sec, just merging in and making sure its running.. it'll look something like:

openzfsonwindows@2934c88

hrmm i need to pull out the kfpu_vars as a PR - so I can fix up in a separate PR

Ah, I see - sorry for these. Maybe an extra PR just for these changes would be good - so they get in quickly.
I would re-view this PR extra fast ;-)

@lundman
Copy link
Contributor

lundman commented Mar 4, 2023

Ignore previous messages, they were wrong.

The issue was actually that it uses the TF macro for most calls, to make wrappers then call into assembly;

#define	TF(E, N) \
	extern void ASMABI E(uint32_t s[8], const void *, size_t); \
	static inline void N(uint32_t s[8], const void *d, size_t b) { \
	kfpu_vars; kfpu_begin(); E(s, d, b); kfpu_end(); \
}

Except for the first type:

extern void zfs_sha256_transform_x64(uint32_t s[8], const void *, size_t);
 const sha256_ops_t sha256_x64_impl = {
        .is_supported = sha2_is_supported,
       .transform = zfs_sha256_transform_x64,

which created a case where we do regular C calls to "wrappers" (which then do ASMABI calls out), except for zfs_sha256_transform_x64() which needed to be ASMABI directly. Making it used inconsistently.

If I change that function to also have a wrapper;

TF(zfs_sha256_transform_x64, tf_sha256_transform_x64);
 const sha256_ops_t sha256_x64_impl = {
        .is_supported = sha2_is_supported,
       .transform = tf_sha256_transform_x64,

Then it is consistent. All wrappers are called in C, and all assembly are called with ASMABI.

I presume that zfs_sha256_transform_x64() was done differently as it does not need to call kfpu_begin() ?

If that is the case, I can:

  1. Make another macro like TF2, which doesn't call kfpu_begin() and code is nice and clean
  2. Make a manual wrapper just above it, not using TF macro.
  3. Use #ifdef _WIN32|APPLE and only make wrappers in for those platforms.

@lundman
Copy link
Contributor

lundman commented Mar 4, 2023

Also, the nested kfpu_begin() calls seem to have been cleaned up, so I can probably get rid of kfpu_vars

@mcmilk
Copy link
Contributor Author

mcmilk commented Mar 4, 2023

Hello @lundman - yes, the zfs_sha256_transform_x64() and zfs_sha512_transform_x64() functions use only regular non-SIMD code and do not need the kfpu() helper.... you should be able to use all these ossl SHA2 assembly directly in MacOS and Windows.

What we need in general, is some new SPL code to get crypto functions from FreeBSD, MacOS and Windows supported. I am thinking about this ... and will create some PR as draft when there is sth. ready. Maybe you had also this idea.

@lundman lundman mentioned this pull request Mar 4, 2023
13 tasks
@Low-power
Copy link
Contributor

It appears that IBM POWER7 (Power ISA 2.06) has being mis-detected to use zfs_sha256_power8, resulted in a kernel oops upon module loading:

[1673143.127724] zfs: module license 'CDDL' taints kernel.
[1673143.127782] Disabling lock debugging due to kernel taint
[1673143.912577] Unrecoverable VMX/Altivec Unavailable Exception f20 at d000000006e53050
[1673143.912652] Oops: Unrecoverable VMX/Altivec Unavailable Exception, sig: 6 [#1]
[1673143.912713] SMP NR_CPUS=128 NUMA pSeries
[1673143.912766] Modules linked in: zfs(PO+) spl(O) nfnetlink rpcsec_gss_krb5 auth_rpcgss oid_registry nfsv4 nfs lockd sunrpc grace fscache usb_storage joydev st xt_tcpudp iptable_filter snd_usb_audio cp210x pl2303 snd_usbmidi_lib snd_hwdep usbserial snd_pcm snd_timer snd_rawmidi snd_seq_device snd soundcore evdev uio_pdrv_genirq uio fuse drm configfs ip_tables x_tables autofs4 ext4 crc16 jbd2 mbcache crc32c_generic btrfs xor raid6_pq sg sr_mod cdrom sd_mod ses enclosure hid_generic usbhid hid ohci_pci ohci_hcd ehci_pci ehci_hcd ipr igb ptp pps_core usbcore usb_common libata ehea
[1673143.913693] CPU: 0 PID: 8725 Comm: insmod Tainted: P           O    4.1.42-rivoreo-powerpc64-largepage #20
[1673143.913779] task: c0000002eefe5970 ti: c00000062d77c000 task.ti: c00000062d77c000
[1673143.913841] NIP: d000000006e53050 LR: d000000006c5e700 CTR: d000000006c5e6f0
[1673143.913901] REGS: c00000062d77ed90 TRAP: 0f20   Tainted: P           O     (4.1.42-rivoreo-powerpc64-largepage)
[1673143.913990] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI>  CR: 44002424  XER: 00000000
[1673143.914122] CFAR: d000000006c5e6fc SOFTE: 1 
[1673143.914122] GPR00: d000000006c5e480 c00000062d77f010 d000000006f696a0 c00000062d77f490 
[1673143.914122] GPR04: d00000000d040000 0000000000000010 0000000000000000 0000000000000001 
[1673143.914122] GPR08: d000000006c5e700 d000000006f418b8 00000000000000cf 00000000000000df 
[1673143.914122] GPR12: 0000000024008424 c00000000ee40000 0000000000000000 c000000000872a40 
[1673143.914122] GPR16: c00000062d77fc58 d00000000403d350 0000000000000124 0000000000000000 
[1673143.914122] GPR20: c000000000617d50 0005f1b71bd88c36 00000000000f423f c00000062d77f4c0 
[1673143.914122] GPR24: d00000000d040400 c00000062d77f490 d000000006e89300 0000000000000400 
[1673143.914122] GPR28: 0000000000002000 0000000000002000 0000000000000000 0000000000000000 
[1673143.915022] NIP [d000000006e53050] .zfs_sha256_power8+0x10/0xec0 [zfs]
[1673143.915231] LR [d000000006c5e700] .tf_sha256_power8+0x10/0x24 [zfs]
[1673143.915285] Call Trace:
[1673143.915315] [c00000062d77f010] [c00000000084dd00] 0xc00000000084dd00 (unreliable)
[1673143.915537] [c00000062d77f190] [d000000006c5e6dc] .tf_sha256_ppc+0x10/0x24 [zfs]
[1673143.915751] [c00000062d77f200] [d000000006c5e480] .SHA2Update+0x12c/0x19c [zfs]
[1673143.915982] [c00000062d77f2c0] [d000000006da1050] .sha_incremental+0x24/0x3c [zfs]
[1673143.916207] [c00000062d77f330] [d000000006d1a1c0] .abd_iterate_func+0xec/0x1a0 [zfs]
[1673143.916441] [c00000062d77f420] [d000000006da10ac] .abd_checksum_sha256+0x44/0x80 [zfs]
[1673143.916673] [c00000062d77f5b0] [d000000006dfa3a0] .chksum_run.isra.0+0xbc/0x114 [zfs]
[1673143.916932] [c00000062d77f6b0] [d000000006dfa480] .chksum_benchit+0x88/0x1c8 [zfs]
[1673143.917185] [c00000062d77f740] [d000000006dfa810] .chksum_init+0x250/0x574 [zfs]
[1673143.917440] [c00000062d77f810] [d000000006dbd01c] .spa_init+0x160/0x210 [zfs]
[1673143.917690] [c00000062d77f8b0] [d000000006e0a960] .zfs_kmod_init+0x28/0xed0 [zfs]
[1673143.917941] [c00000062d77f970] [d000000006e5b664] .openzfs_init+0x48/0xfc [zfs]
[1673143.918029] [c00000062d77f9f0] [c00000000000ae88] .do_one_initcall+0x140/0x1dc
[1673143.918114] [c00000062d77fae0] [c0000000005c8634] .do_init_module+0x70/0x1e4
[1673143.918199] [c00000062d77fb80] [c000000000113820] .load_module+0x1b88/0x2054
[1673143.918282] [c00000062d77fd30] [c000000000113e5c] .SyS_finit_module+0x6c/0x74
[1673143.918367] [c00000062d77fe30] [c000000000009360] system_call+0x38/0xd8
[1673143.918446] Instruction dump:
[1673143.918499] 60000000 60000000 60000000 60000000 60000000 60000000 60000000 60000000 
[1673143.918638] f821fe81 7d0802a6 394000cf 396000df <7f0a09ce> 394a0020 7d8042a6 7f2b09ce 
[1673143.921799] ---[ end trace 2f19b717bc9d3c92 ]---
[1673143.926650] 

@mcmilk
Copy link
Contributor Author

mcmilk commented Mar 4, 2023

@Low-power - can you modify the zfs_sha256_power8() macro ?
I did the ppc testings only in userspace on machines of the gcc farm.

But I got fresh access to ppc machines here: https://osuosl.org/services/powerdev/ ... but I am not ready for testings there.

Edit: the this macro checks for power8: zfs_isa207_available() .... Power7 (2.06) should fail normally.
Edit2: my fault :/ --> the zfs_isa207_available() was not used

Can you try this patch:

diff --git a/module/icp/algs/sha2/sha256_impl.c b/module/icp/algs/sha2/sha256_impl.c
index 024cfb1e4..63f181d2e 100644
--- a/module/icp/algs/sha2/sha256_impl.c
+++ b/module/icp/algs/sha2/sha256_impl.c
@@ -141,9 +141,9 @@ const sha256_ops_t sha256_armv8_impl = {
 };

 #elif defined(__PPC64__)
-static boolean_t sha256_have_vsx(void)
+static boolean_t sha256_have_isa207(void)
 {
-       return (kfpu_allowed() && zfs_vsx_available());
+       return (kfpu_allowed() && zfs_isa207_available());
 }

 TF(zfs_sha256_ppc, tf_sha256_ppc);
@@ -155,7 +155,7 @@ const sha256_ops_t sha256_ppc_impl = {

 TF(zfs_sha256_power8, tf_sha256_power8);
 const sha256_ops_t sha256_power8_impl = {
-       .is_supported = sha256_have_vsx,
+       .is_supported = sha256_have_isa207,
        .transform = tf_sha256_power8,
        .name = "power8"
 };

@mcmilk
Copy link
Contributor Author

mcmilk commented Mar 4, 2023

I would provide an official PR when you say the kernel oops is away.

@Low-power
Copy link
Contributor

I would provide an official PR when you say the kernel oops is away.

Thanks for the patch, I will try it after a fresh reboot on next Monday.

@Low-power
Copy link
Contributor

I just tested it; the fix however is incomplete, icp/algs/sha2/sha512_impl.c needs a similar fix too.

[  517.748405] zfs: module license 'CDDL' taints kernel.
[  517.748477] Disabling lock debugging due to kernel taint
[  518.885693] Unrecoverable VMX/Altivec Unavailable Exception f20 at d000000008d43f10
[  518.885782] Oops: Unrecoverable VMX/Altivec Unavailable Exception, sig: 6 [#1]
[  518.885857] SMP NR_CPUS=128 NUMA pSeries
[  518.885955] Modules linked in: zfs(PO+) spl(O) cpufreq_stats bridge stp llc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc pl2303 snd_usb_audio snd_usbmidi_lib snd_hwdep cp210x snd_pcm snd_timer snd_rawmidi snd_seq_device usbserial snd soundcore evdev uio_pdrv_genirq uio xt_tcpudp iptable_filter fuse ip_tables x_tables autofs4 ext4 crc16 jbd2 mbcache crc32c_generic btrfs xor raid6_pq sg sr_mod sd_mod ses cdrom enclosure hid_generic usbhid hid ohci_pci ohci_hcd ehci_pci ehci_hcd igb ptp pps_core usbcore ipr usb_common libata ehea
[  518.887859] CPU: 0 PID: 6499 Comm: insmod Tainted: P           O    4.1.42-rivoreo-powerpc64-largepage #22
[  518.887951] task: c0000007b33c2670 ti: c0000007ad480000 task.ti: c0000007ad480000
[  518.888028] NIP: d000000008d43f10 LR: d000000008b4eccc CTR: d000000008b4ecbc
[  518.888101] REGS: c0000007ad482db0 TRAP: 0f20   Tainted: P           O     (4.1.42-rivoreo-powerpc64-largepage)
[  518.888194] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI>  CR: 44002424  XER: 00000000
[  518.888474] CFAR: d000000008b4ecc8 SOFTE: 1 
GPR00: d000000008b4df04 c0000007ad483030 d000000008e596a0 c0000007ad4834b0 
GPR04: d00000000bb40000 0000000000000008 0000000000000000 0000000000000001 
GPR08: d000000008b4eccc d000000008e31a08 00000000000000cf 00000000000000df 
GPR12: 0000000024008424 c00000000ee40000 0000000000000000 c000000000872a40 
GPR16: c0000007ad483c58 d0000000084ed350 0000000000000124 0000000000000000 
GPR20: c000000000617d50 00000078bd68f26d 00000000000f423f d000000008d79398 
GPR24: 0000000000000400 c0000007ad483500 0000000000002000 d00000000bb40400 
GPR28: c0000007ad4834b0 0000000000002000 0000000000000000 0000000000000000 
[  518.889999] NIP [d000000008d43f10] .zfs_sha512_power8+0x10/0xfc0 [zfs]
[  518.890210] LR [d000000008b4eccc] .tf_sha512_power8+0x10/0x24 [zfs]
[  518.890279] Call Trace:
[  518.890323] [c0000007ad483030] [c0000007ad4830d0] 0xc0000007ad4830d0 (unreliable)
[  518.890438] [c0000007ad4831b0] [c00000000084dd00] 0xc00000000084dd00
[  518.890662] [c0000007ad483220] [d000000008b4df04] .sha512_update+0xd4/0x108 [zfs]
[  518.890911] [c0000007ad4832e0] [d000000008c91050] .sha_incremental+0x24/0x3c [zfs]
[  518.891153] [c0000007ad483350] [d000000008c0a1c0] .abd_iterate_func+0xec/0x1a0 [zfs]
[  518.891403] [c0000007ad483440] [d000000008c9112c] .abd_checksum_sha512_native+0x44/0x60 [zfs]
[  518.891657] [c0000007ad4835b0] [d000000008cea3a0] .chksum_run.isra.0+0xbc/0x114 [zfs]
[  518.891913] [c0000007ad4836b0] [d000000008cea480] .chksum_benchit+0x88/0x1c8 [zfs]
[  518.892160] [c0000007ad483740] [d000000008cea934] .chksum_init+0x374/0x574 [zfs]
[  518.892408] [c0000007ad483810] [d000000008cad01c] .spa_init+0x160/0x210 [zfs]
[  518.892653] [c0000007ad4838b0] [d000000008cfa960] .zfs_kmod_init+0x28/0xed0 [zfs]
[  518.892897] [c0000007ad483970] [d000000008d4b664] .openzfs_init+0x48/0xfc [zfs]
[  518.892992] [c0000007ad4839f0] [c00000000000ae88] .do_one_initcall+0x140/0x1dc
[  518.893083] [c0000007ad483ae0] [c0000000005c8634] .do_init_module+0x70/0x1e4
[  518.893173] [c0000007ad483b80] [c000000000113820] .load_module+0x1b88/0x2054
[  518.893263] [c0000007ad483d30] [c000000000113e5c] .SyS_finit_module+0x6c/0x74
[  518.893357] [c0000007ad483e30] [c000000000009360] system_call+0x38/0xd8
[  518.893442] Instruction dump:
[  518.893528] 00010203 04050607 10111213 10111213 00010203 04050607 08090a0b 10111213 
[  518.893804] f821fe81 7d0802a6 394000cf 396000df <7f0a09ce> 394a0020 7d8042a6 7f2b09ce 
[  518.894081] ---[ end trace f273ebecb440dc69 ]---

@mcmilk mcmilk deleted the sha2-rework branch March 8, 2023 09:37
zfs_sha256_available(void)
{
unsigned long ftr = ((get_ftr(ID_AA64ISAR0_EL1)) >> 12) & 0x3;
return (ftr & 0x1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be different on non-Apple arm64, but my M2 reports:

<zfs`spl_cpuid_id_aa64isar0_el1 (spl-processor.c:172)> cpu_features0: 0x0221100110212120
<zfs`spl_cpuid_id_aa64isar0_el1 (spl-processor.c:172)> cpu_features0: AES PMULL SHA1 SHA256 SHA512 SHA3

Note that the SHA bits [12-15] (0xf) has the value 0x2. Which would in the above "if" only say yes to SHA512, but not SHA256. I take it to mean that if SHA512 is enabled, then SHA256 is implied as as well. I would think if should
be if (ftr & 0xf) being anything not-zero. (or 0x3 if preferred, but either bit)
Is this an Apple quirk?

It also happens with the AES bits [4-7]

            aes & 3 ? "AES " : "",
            aes & 2 ? "PMULL " : "",

If either bit is set, it has AES, if 0x2 is set, it also has PMULL.

@lundman
Copy link
Contributor

lundman commented Mar 20, 2023

OK, so changes to compile on macOS in shared files:

openzfsonosx@256294a
and changes to macOS files:

openzfsonosx@2f12e62

adding aesv8

openzfsonosx@ae7a8ac

pcd1193182 pushed a commit to pcd1193182/zfs that referenced this pull request Sep 26, 2023
We had three sha2.h headers in different places.
The FreeBSD version, the Linux version and the generic solaris version.

The only assembly used for acceleration was some old x86-64 openssl
implementation for sha256 within the icp module.

For FreeBSD the whole SHA2 files of FreeBSD were copied into OpenZFS,
these files got removed also.

Tested-by: Rich Ercolani <[email protected]>
Tested-by: Sebastian Gottschall <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tino Reichardt <[email protected]>
Closes openzfs#13741
pcd1193182 pushed a commit to pcd1193182/zfs that referenced this pull request Sep 26, 2023
Tested-by: Rich Ercolani <[email protected]>
Tested-by: Sebastian Gottschall <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tino Reichardt <[email protected]>
Closes openzfs#13741
pcd1193182 pushed a commit to pcd1193182/zfs that referenced this pull request Sep 26, 2023
These are added:
- zfs_neon_available() for arm and aarch64
- zfs_sha256_available() for arm and aarch64
- zfs_sha512_available() for aarch64
- zfs_shani_available() for x86_64

Changes:
- simd_powerpc.h: change license from CDDL to BSD

Tested-by: Rich Ercolani <[email protected]>
Tested-by: Sebastian Gottschall <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tino Reichardt <[email protected]>
Closes openzfs#13741
pcd1193182 pushed a commit to pcd1193182/zfs that referenced this pull request Sep 26, 2023
These are added:
- zfs_neon_available() for arm and aarch64
- zfs_sha256_available() for arm and aarch64
- zfs_sha512_available() for aarch64
- zfs_shani_available() for x86_64

Tested-by: Rich Ercolani <[email protected]>
Tested-by: Sebastian Gottschall <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tino Reichardt <[email protected]>
Co-Authored-By: Sebastian Gottschall <[email protected]>
Closes openzfs#13741
pcd1193182 pushed a commit to pcd1193182/zfs that referenced this pull request Sep 26, 2023
These are added via HWCAP interface:
- zfs_neon_available() for arm and aarch64
- zfs_sha256_available() for arm and aarch64
- zfs_sha512_available() for aarch64

This one via cpuid() call:
- zfs_shani_available() for x86_64

Tested-by: Rich Ercolani <[email protected]>
Tested-by: Sebastian Gottschall <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tino Reichardt <[email protected]>
Closes openzfs#13741
pcd1193182 pushed a commit to pcd1193182/zfs that referenced this pull request Sep 26, 2023
The skeleton file module/icp/include/generic_impl.c can be used for
iterating over different implementations of algorithms.

It is used by SHA256, SHA512 and BLAKE3 currently.

The Solaris SHA2 implementation got replaced with a version which is
based on public domain code of cppcrypto v0.10.

These assembly files are taken from current openssl master:
- sha256-x86_64.S: x64, SSSE3, AVX, AVX2, SHA-NI (x86_64)
- sha512-x86_64.S: x64, AVX, AVX2 (x86_64)
- sha256-armv7.S: ARMv7, NEON, ARMv8-CE (arm)
- sha512-armv7.S: ARMv7, NEON (arm)
- sha256-armv8.S: ARMv7, NEON, ARMv8-CE (aarch64)
- sha512-armv8.S: ARMv7, ARMv8-CE (aarch64)
- sha256-ppc.S: Generic PPC64 LE/BE (ppc64)
- sha512-ppc.S: Generic PPC64 LE/BE (ppc64)
- sha256-p8.S: Power8 ISA Version 2.07 LE/BE (ppc64)
- sha512-p8.S: Power8 ISA Version 2.07 LE/BE (ppc64)

Tested-by: Rich Ercolani <[email protected]>
Tested-by: Sebastian Gottschall <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tino Reichardt <[email protected]>
Closes openzfs#13741
pcd1193182 pushed a commit to pcd1193182/zfs that referenced this pull request Sep 26, 2023
This commit changes the BLAKE3 implementation handling and
also the calls to it from the ztest command.

Tested-by: Rich Ercolani <[email protected]>
Tested-by: Sebastian Gottschall <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tino Reichardt <[email protected]>
Closes openzfs#13741
pcd1193182 pushed a commit to pcd1193182/zfs that referenced this pull request Sep 26, 2023
- instead of ".section .rodata" we should use SECTION_STATIC

Tested-by: Rich Ercolani <[email protected]>
Tested-by: Sebastian Gottschall <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tino Reichardt <[email protected]>
Closes openzfs#13741
behlendorf added a commit that referenced this pull request Oct 13, 2023
New Features
- Block cloning (#13392)
- Linux container support (#14070, #14097, #12263)
- Scrub error log (#12812, #12355)
- BLAKE3 checksums (#12918)
- Corrective "zfs receive"
- Vdev and zpool user properties

Performance
- Fully adaptive ARC (#14359)
- SHA2 checksums (#13741)
- Edon-R checksums (#13618)
- Zstd early abort (#13244)
- Prefetch improvements (#14603, #14516, #14402, #14243, #13452)
- General optimization (#14121, #14123, #14039, #13680, #13613,
  #13606, #13576, #13553, #12789, #14925, #14948)

Signed-off-by: Brian Behlendorf <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants