Skip to content

Conversation

@starius
Copy link
Contributor

@starius starius commented Oct 26, 2025

Implement overflow-aware optimization in ctrBlocks8Asm: make a fast branch
in case when there is no overflow. One branch per 8 blocks is faster than
7 increments in general purpose registers and transfers from them to XMM.

Added AES-192 and AES-256 modes to the AES-CTR benchmark.

Added a correctness test in ctr_test.go for the overflow optimization.

This improves performance, especially in AES-128 mode.

goos: windows
goarch: amd64
pkg: crypto/cipher
cpu: AMD Ryzen 7 5800H with Radeon Graphics
│ B/s │ B/s vs base
AESCTR/128/50-16 1.377Gi ± 0% 1.384Gi ± 0% +0.51% (p=0.028 n=20)
AESCTR/128/1K-16 6.164Gi ± 0% 6.892Gi ± 1% +11.81% (p=0.000 n=20)
AESCTR/128/8K-16 7.372Gi ± 0% 8.768Gi ± 1% +18.95% (p=0.000 n=20)
AESCTR/192/50-16 1.289Gi ± 0% 1.279Gi ± 0% -0.75% (p=0.001 n=20)
AESCTR/192/1K-16 5.734Gi ± 0% 6.011Gi ± 0% +4.83% (p=0.000 n=20)
AESCTR/192/8K-16 6.889Gi ± 1% 7.437Gi ± 0% +7.96% (p=0.000 n=20)
AESCTR/256/50-16 1.170Gi ± 0% 1.163Gi ± 0% -0.54% (p=0.005 n=20)
AESCTR/256/1K-16 5.235Gi ± 0% 5.391Gi ± 0% +2.98% (p=0.000 n=20)
AESCTR/256/8K-16 6.361Gi ± 0% 6.676Gi ± 0% +4.94% (p=0.000 n=20)
geomean 3.681Gi 3.882Gi +5.46%

The slight slowdown on 50-byte workloads is unrelated to this change,
because such workloads never use ctrBlocks8Asm.

Updates #76061

Fixed two issues in AVO based generator of amd64 asm code.

1. Updated golang.org/x/tools dependency to prevent build issue in Go 1.25.

> golang.org/x/[email protected]/internal/tokeninternal/tokeninternal.go:64:9:
> invalid array length -delta * delta (constant -256 of type int64)

This error was caused by changes in layout of data structures in Go. Package
golang.org/x/tools has a mirror of that struct and a static assert that it
matches the Go's struct.

2. Changed the package name from crypto/aes to crypto/internal/fips140/aes.

This fixed run time error:

> ctr_amd64_asm.go:31: could not find function "ctrBlocks1Asm"
and other errors

Now the following works as expected:

$ cd src/crypto/internal/fips140/aes/_asm/ctr/
$ go generate

The command re-generates file "src/crypto/internal/fips140/aes/ctr_amd64.s".

Fixes golang#75972

Change-Id: I28e4c9ebb5bf72506a524e36a0c81a1b50367a84
GitHub-Last-Rev: afc9f50
GitHub-Pull-Request: golang#75973
@gopherbot
Copy link
Contributor

This PR (HEAD: 8568edb) has been imported to Gerrit for code review.

Please visit Gerrit at https://go-review.googlesource.com/c/go/+/714361.

Important tips:

  • Don't comment on this PR. All discussion takes place in Gerrit.
  • You need a Gmail or other Google account to log in to Gerrit.
  • To change your code in response to feedback:
    • Push a new commit to the branch used by your GitHub PR.
    • A new "patch set" will then appear in Gerrit.
    • Respond to each comment by marking as Done in Gerrit if implemented as suggested. You can alternatively write a reply.
    • Critical: you must click the blue Reply button near the top to publish your Gerrit responses.
    • Multiple commits in the PR will be squashed by GerritBot.
  • The title and description of the GitHub PR are used to construct the final commit message.
    • Edit these as needed via the GitHub web interface (not via Gerrit or git).
    • You should word wrap the PR description at ~76 characters unless you need longer lines (e.g., for tables or URLs).
  • See the Sending a change via GitHub and Reviews sections of the Contribution Guide as well as the FAQ for details.

@gopherbot
Copy link
Contributor

Message from Gopher Robot:

Patch Set 1:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/714361.
After addressing review feedback, remember to publish your drafts!

@starius starius force-pushed the aes-ctr-amd64-overflow-aware-optimization branch from 8568edb to ef888f1 Compare October 26, 2025 17:25
@gopherbot
Copy link
Contributor

This PR (HEAD: ef888f1) has been imported to Gerrit for code review.

Please visit Gerrit at https://go-review.googlesource.com/c/go/+/714361.

Important tips:

  • Don't comment on this PR. All discussion takes place in Gerrit.
  • You need a Gmail or other Google account to log in to Gerrit.
  • To change your code in response to feedback:
    • Push a new commit to the branch used by your GitHub PR.
    • A new "patch set" will then appear in Gerrit.
    • Respond to each comment by marking as Done in Gerrit if implemented as suggested. You can alternatively write a reply.
    • Critical: you must click the blue Reply button near the top to publish your Gerrit responses.
    • Multiple commits in the PR will be squashed by GerritBot.
  • The title and description of the GitHub PR are used to construct the final commit message.
    • Edit these as needed via the GitHub web interface (not via Gerrit or git).
    • You should word wrap the PR description at ~76 characters unless you need longer lines (e.g., for tables or URLs).
  • See the Sending a change via GitHub and Reviews sections of the Contribution Guide as well as the FAQ for details.

@gopherbot
Copy link
Contributor

Message from Борис Нагаев:

Patch Set 3:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/714361.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Filippo Valsorda:

Patch Set 3:

(2 comments)


Please don’t reply on this GitHub thread. Visit golang.org/cl/714361.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Борис Нагаев:

Patch Set 3:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/714361.
After addressing review feedback, remember to publish your drafts!

Implement overflow-aware optimization in ctrBlocks8Asm: make a fast branch
in case when there is no overflow. One branch per 8 blocks is faster than
7 increments in general purpose registers and transfers from them to XMM.

Added AES-192 and AES-256 modes to the AES-CTR benchmark.

Added a correctness test in ctr_aes_test.go for the overflow optimization.

This improves performance, especially in AES-128 mode.

goos: windows
goarch: amd64
pkg: crypto/cipher
cpu: AMD Ryzen 7 5800H with Radeon Graphics
                 │     B/s      │     B/s       vs base
AESCTR/128/50-16   1.377Gi ± 0%   1.384Gi ± 0%   +0.51% (p=0.028 n=20)
AESCTR/128/1K-16   6.164Gi ± 0%   6.892Gi ± 1%  +11.81% (p=0.000 n=20)
AESCTR/128/8K-16   7.372Gi ± 0%   8.768Gi ± 1%  +18.95% (p=0.000 n=20)
AESCTR/192/50-16   1.289Gi ± 0%   1.279Gi ± 0%   -0.75% (p=0.001 n=20)
AESCTR/192/1K-16   5.734Gi ± 0%   6.011Gi ± 0%   +4.83% (p=0.000 n=20)
AESCTR/192/8K-16   6.889Gi ± 1%   7.437Gi ± 0%   +7.96% (p=0.000 n=20)
AESCTR/256/50-16   1.170Gi ± 0%   1.163Gi ± 0%   -0.54% (p=0.005 n=20)
AESCTR/256/1K-16   5.235Gi ± 0%   5.391Gi ± 0%   +2.98% (p=0.000 n=20)
AESCTR/256/8K-16   6.361Gi ± 0%   6.676Gi ± 0%   +4.94% (p=0.000 n=20)
geomean            3.681Gi        3.882Gi        +5.46%

The slight slowdown on 50-byte workloads is unrelated to this change,
because such workloads never use ctrBlocks8Asm.
@starius starius force-pushed the aes-ctr-amd64-overflow-aware-optimization branch from ef888f1 to 8579bce Compare October 28, 2025 03:06
@gopherbot
Copy link
Contributor

This PR (HEAD: 8579bce) has been imported to Gerrit for code review.

Please visit Gerrit at https://go-review.googlesource.com/c/go/+/714361.

Important tips:

  • Don't comment on this PR. All discussion takes place in Gerrit.
  • You need a Gmail or other Google account to log in to Gerrit.
  • To change your code in response to feedback:
    • Push a new commit to the branch used by your GitHub PR.
    • A new "patch set" will then appear in Gerrit.
    • Respond to each comment by marking as Done in Gerrit if implemented as suggested. You can alternatively write a reply.
    • Critical: you must click the blue Reply button near the top to publish your Gerrit responses.
    • Multiple commits in the PR will be squashed by GerritBot.
  • The title and description of the GitHub PR are used to construct the final commit message.
    • Edit these as needed via the GitHub web interface (not via Gerrit or git).
    • You should word wrap the PR description at ~76 characters unless you need longer lines (e.g., for tables or URLs).
  • See the Sending a change via GitHub and Reviews sections of the Contribution Guide as well as the FAQ for details.

@gopherbot
Copy link
Contributor

Message from Борис Нагаев:

Patch Set 4:

(2 comments)


Please don’t reply on this GitHub thread. Visit golang.org/cl/714361.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from AHMAD ابو وليد:

Patch Set 4: Code-Review+1

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/714361.
After addressing review feedback, remember to publish your drafts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants