Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Implement OpenSSL's AES GCM acceleration for armv8 #16601

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

w0xel
Copy link
Contributor

@w0xel w0xel commented Oct 3, 2024

This is WIP still (but it works), and if you agree I'd like to split this PR up into smaller PRs next before merging:

  • Reenable kfpu_allowed() for newer kernels to make use of NEON registers
  • Add the single-block AES implementation and key expansion from OpenSSL
  • Restructure multi-block AVX AES GCM codepath slightly to allow for sharing code with ARM codepath
  • (This PR): Implement OpenSSL's AES-GCM acceleration

Important TODO (putting this high up here so I don't forget):

  • Either fix or drop 128bit and 192bit kernels, they are untouched currently and will break when used.

I'll post some benchmarks on #12171 next.

Motivation and Context

Encrypted ZFS is currently very slow on aarch64 devices.
This due to missing hardware acceleration support: #12171
Also the use of NEON is disabled on newer kernels: #14555

Description

Assembly generated from these files in the OpenSSL source:

(ASM diffs at time of creation, will try to update those)

with slight modifications to support the ICP interface.

For the chunked AES-GCM codepath the AVX code was reused by renaming avx -> hardware.

  • This is a lot of changes right now, but I've preferred this over copy-pasting the code, as it was resuable basically 1-to-1
  • If you have a better idea of how to abstract/structure this let me know :)

Kernel FPU support was re-added by naively storing/restoring vector registers.

  • This is basically the same as the x86_64 code, but a bit easier
  • AFAIK armv8 has no alignment requirements here, am I wrong?
  • We should IMHO disable this codepath on RT kernels. Same for x86.

And a bunch of support/wiring code around that.

How Has This Been Tested?

Definitely not well enough yet, do not merge yet.

The code was tested on a Raspberry Pi 5. I've created an encrypted fs, benchmarked it, swapped AES implementations, benchmarked it again and vice versa.

I did not test the changes on x86_64 yet, which is also affected by this PR.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

Assembly generated from these files in the OpenSSL source:
 - crypto/modes/asm/ghashv8-armx.pl
 - crypto/modes/asm/aes-gcm-armv8_64.pl
 - crypto/aes/asm/aesv8-armx.pl

with slight modifications to support the ICP interface.

For the chunked AES-GCM codepath the AVX code was reused by
renaming `avx` -> `hardware`.

Kernel FPU support was readded by naively storing/restoring
vector registers.

Signed-off-by: Sebastian Wuerl <[email protected]>
@w0xel
Copy link
Contributor Author

w0xel commented Oct 3, 2024

Some more info in this comment: #12171 (comment)

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Oct 4, 2024
@leelists
Copy link

awesome, this make encryption usable on rk3588 :-)

@mcmilk
Copy link
Contributor

mcmilk commented Oct 24, 2024

In a first view it looks really good. I will review it deeper on one or two weeks.
Thanks a lot for this PR.

@Harry-Chen
Copy link
Contributor

Harry-Chen commented Oct 31, 2024

It would be clearer if you could split into separate commits, e.g.:

  • one that implements & enables kfpu on aarch64 (so that current neon kernels for hashing and encryption would work again)
  • one that port some kernel files from openssl project (that would be ~8000 LOC)
  • one that enables these kernels and adds the dispatching

Edit: Sorry I missed your description that says you would split PRs, which exactly matched my thought. Please do!

@amotin amotin added the Status: Work in Progress Not yet ready for general review label Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing Status: Work in Progress Not yet ready for general review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants