Skip to content

perf: Add benchmarks and reduce allocations by ~50% for ARC and VOPRF#346

Merged
Lukasa merged 6 commits intoapple:mainfrom
simonjbeaumont:benchmarks
Apr 9, 2025
Merged

perf: Add benchmarks and reduce allocations by ~50% for ARC and VOPRF#346
Lukasa merged 6 commits intoapple:mainfrom
simonjbeaumont:benchmarks

Conversation

@simonjbeaumont
Copy link
Copy Markdown
Contributor

Motivation

We've recently added some EC operations internally to support some schemes, including ARC and VOPRF. Some of this code is quite allocation-heavy, which could be avoided with some minimally invasive changes.

Modifications

  • Add package benchmarks for ARC and VOPRF.
  • Make use of the consuming keyword in some targeted places to reduce allocations in arithmetic- and append-chains.
  • Use a Thread-/Task-local FiniteFieldArithmeticContext for EC operations, and thread this through as a BN_CTX to BoringSSL functions when available.
  • Replace EC static computed properties with stored properties for things that are statically knowable about the EC group.

Result

This has resulted in a 50% reduction in allocations for ARC issuance and VOPRF evaluation, and a 34% reduction in ARC verification.

SwiftCryptoBenchmarks
============================================================================================================================

----------------------------------------------------------------------------------------------------------------------------
arc-issue-p384 metrics
----------------------------------------------------------------------------------------------------------------------------

╒══════════════════════════════════════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╕
│             Malloc (total) *             │        p0 │       p25 │       p50 │       p75 │       p90 │       p99 │      p100 │   Samples │
╞══════════════════════════════════════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╡
│                  alpha                   │       945 │       945 │       945 │       945 │       945 │       945 │       945 │         3 │
├──────────────────────────────────────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│                   beta                   │       464 │       464 │       464 │       464 │       464 │       464 │       464 │         3 │
├──────────────────────────────────────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│                    Δ                     │      -481 │      -481 │      -481 │      -481 │      -481 │      -481 │      -481 │         0 │
├──────────────────────────────────────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│              Improvement %               │        51 │        51 │        51 │        51 │        51 │        51 │        51 │         0 │
╘══════════════════════════════════════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╛

----------------------------------------------------------------------------------------------------------------------------
arc-verify-p384 metrics
----------------------------------------------------------------------------------------------------------------------------

╒══════════════════════════════════════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╕
│             Malloc (total) *             │        p0 │       p25 │       p50 │       p75 │       p90 │       p99 │      p100 │   Samples │
╞══════════════════════════════════════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╡
│                  alpha                   │       410 │       410 │       410 │       415 │       415 │       415 │       415 │        10 │
├──────────────────────────────────────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│                   beta                   │       271 │       271 │       271 │       275 │       275 │       275 │       275 │        10 │
├──────────────────────────────────────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│                    Δ                     │      -139 │      -139 │      -139 │      -140 │      -140 │      -140 │      -140 │         0 │
├──────────────────────────────────────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│              Improvement %               │        34 │        34 │        34 │        34 │        34 │        34 │        34 │         0 │
╘══════════════════════════════════════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╛

----------------------------------------------------------------------------------------------------------------------------
voprf-evaluate-p384 metrics
----------------------------------------------------------------------------------------------------------------------------

╒══════════════════════════════════════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╕
│             Malloc (total) *             │        p0 │       p25 │       p50 │       p75 │       p90 │       p99 │      p100 │   Samples │
╞══════════════════════════════════════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╡
│                  alpha                   │       331 │       331 │       331 │       331 │       331 │       331 │       331 │         3 │
├──────────────────────────────────────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│                   beta                   │       168 │       168 │       168 │       168 │       168 │       168 │       168 │         3 │
├──────────────────────────────────────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│                    Δ                     │      -163 │      -163 │      -163 │      -163 │      -163 │      -163 │      -163 │         0 │
├──────────────────────────────────────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│              Improvement %               │        49 │        49 │        49 │        49 │        49 │        49 │        49 │         0 │
╘══════════════════════════════════════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╛

@rnro
Copy link
Copy Markdown
Contributor

rnro commented Apr 1, 2025

This looks really cool. I'm curious about if you know how this might interact with a CI strategy? Specifically I don't know how stable the CPU benchmarks would be and if it's possible to only check specific benchmarks in a thresholds check. No worries if you haven't put any thought into this, I'm just curious.

@simonjbeaumont
Copy link
Copy Markdown
Contributor Author

This looks really cool. I'm curious about if you know how this might interact with a CI strategy? Specifically I don't know how stable the CPU benchmarks would be and if it's possible to only check specific benchmarks in a thresholds check. No worries if you haven't put any thought into this, I'm just curious.

IIRC our current take was that the CPU time metrics were not stable enough to be part of the CI and that we prefer to use allocations as a proxy for this.

@Lukasa Lukasa enabled auto-merge (squash) April 9, 2025 10:35
@Lukasa Lukasa merged commit 6db5a75 into apple:main Apr 9, 2025
27 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🔨 semver/patch No public API change.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants