-
Couldn't load subscription status.
- Fork 18.4k
crypto/internal/fips140/aes: optimize amd64 #76059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
crypto/internal/fips140/aes: optimize amd64 #76059
Conversation
Fixed two issues in AVO based generator of amd64 asm code. 1. Updated golang.org/x/tools dependency to prevent build issue in Go 1.25. > golang.org/x/[email protected]/internal/tokeninternal/tokeninternal.go:64:9: > invalid array length -delta * delta (constant -256 of type int64) This error was caused by changes in layout of data structures in Go. Package golang.org/x/tools has a mirror of that struct and a static assert that it matches the Go's struct. 2. Changed the package name from crypto/aes to crypto/internal/fips140/aes. This fixed run time error: > ctr_amd64_asm.go:31: could not find function "ctrBlocks1Asm" and other errors Now the following works as expected: $ cd src/crypto/internal/fips140/aes/_asm/ctr/ $ go generate The command re-generates file "src/crypto/internal/fips140/aes/ctr_amd64.s". Fixes golang#75972 Change-Id: I28e4c9ebb5bf72506a524e36a0c81a1b50367a84 GitHub-Last-Rev: afc9f50 GitHub-Pull-Request: golang#75973
|
This PR (HEAD: 8568edb) has been imported to Gerrit for code review. Please visit Gerrit at https://go-review.googlesource.com/c/go/+/714361. Important tips:
|
|
Message from Gopher Robot: Patch Set 1: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/714361. |
8568edb to
ef888f1
Compare
|
This PR (HEAD: ef888f1) has been imported to Gerrit for code review. Please visit Gerrit at https://go-review.googlesource.com/c/go/+/714361. Important tips:
|
|
Message from Борис Нагаев: Patch Set 3: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/714361. |
|
Message from Filippo Valsorda: Patch Set 3: (2 comments) Please don’t reply on this GitHub thread. Visit golang.org/cl/714361. |
|
Message from Борис Нагаев: Patch Set 3: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/714361. |
Implement overflow-aware optimization in ctrBlocks8Asm: make a fast branch
in case when there is no overflow. One branch per 8 blocks is faster than
7 increments in general purpose registers and transfers from them to XMM.
Added AES-192 and AES-256 modes to the AES-CTR benchmark.
Added a correctness test in ctr_aes_test.go for the overflow optimization.
This improves performance, especially in AES-128 mode.
goos: windows
goarch: amd64
pkg: crypto/cipher
cpu: AMD Ryzen 7 5800H with Radeon Graphics
│ B/s │ B/s vs base
AESCTR/128/50-16 1.377Gi ± 0% 1.384Gi ± 0% +0.51% (p=0.028 n=20)
AESCTR/128/1K-16 6.164Gi ± 0% 6.892Gi ± 1% +11.81% (p=0.000 n=20)
AESCTR/128/8K-16 7.372Gi ± 0% 8.768Gi ± 1% +18.95% (p=0.000 n=20)
AESCTR/192/50-16 1.289Gi ± 0% 1.279Gi ± 0% -0.75% (p=0.001 n=20)
AESCTR/192/1K-16 5.734Gi ± 0% 6.011Gi ± 0% +4.83% (p=0.000 n=20)
AESCTR/192/8K-16 6.889Gi ± 1% 7.437Gi ± 0% +7.96% (p=0.000 n=20)
AESCTR/256/50-16 1.170Gi ± 0% 1.163Gi ± 0% -0.54% (p=0.005 n=20)
AESCTR/256/1K-16 5.235Gi ± 0% 5.391Gi ± 0% +2.98% (p=0.000 n=20)
AESCTR/256/8K-16 6.361Gi ± 0% 6.676Gi ± 0% +4.94% (p=0.000 n=20)
geomean 3.681Gi 3.882Gi +5.46%
The slight slowdown on 50-byte workloads is unrelated to this change,
because such workloads never use ctrBlocks8Asm.
ef888f1 to
8579bce
Compare
|
This PR (HEAD: 8579bce) has been imported to Gerrit for code review. Please visit Gerrit at https://go-review.googlesource.com/c/go/+/714361. Important tips:
|
|
Message from Борис Нагаев: Patch Set 4: (2 comments) Please don’t reply on this GitHub thread. Visit golang.org/cl/714361. |
|
Message from AHMAD ابو وليد: Patch Set 4: Code-Review+1 (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/714361. |
Implement overflow-aware optimization in ctrBlocks8Asm: make a fast branch
in case when there is no overflow. One branch per 8 blocks is faster than
7 increments in general purpose registers and transfers from them to XMM.
Added AES-192 and AES-256 modes to the AES-CTR benchmark.
Added a correctness test in ctr_test.go for the overflow optimization.
This improves performance, especially in AES-128 mode.
goos: windows
goarch: amd64
pkg: crypto/cipher
cpu: AMD Ryzen 7 5800H with Radeon Graphics
│ B/s │ B/s vs base
AESCTR/128/50-16 1.377Gi ± 0% 1.384Gi ± 0% +0.51% (p=0.028 n=20)
AESCTR/128/1K-16 6.164Gi ± 0% 6.892Gi ± 1% +11.81% (p=0.000 n=20)
AESCTR/128/8K-16 7.372Gi ± 0% 8.768Gi ± 1% +18.95% (p=0.000 n=20)
AESCTR/192/50-16 1.289Gi ± 0% 1.279Gi ± 0% -0.75% (p=0.001 n=20)
AESCTR/192/1K-16 5.734Gi ± 0% 6.011Gi ± 0% +4.83% (p=0.000 n=20)
AESCTR/192/8K-16 6.889Gi ± 1% 7.437Gi ± 0% +7.96% (p=0.000 n=20)
AESCTR/256/50-16 1.170Gi ± 0% 1.163Gi ± 0% -0.54% (p=0.005 n=20)
AESCTR/256/1K-16 5.235Gi ± 0% 5.391Gi ± 0% +2.98% (p=0.000 n=20)
AESCTR/256/8K-16 6.361Gi ± 0% 6.676Gi ± 0% +4.94% (p=0.000 n=20)
geomean 3.681Gi 3.882Gi +5.46%
The slight slowdown on 50-byte workloads is unrelated to this change,
because such workloads never use ctrBlocks8Asm.
Updates #76061