Skip to content

Conversation

@Vindaar
Copy link
Contributor

@Vindaar Vindaar commented Dec 9, 2025

Implements compute_tree_level as a trait function of the TweakableHash, which defaults to the scalar implementation used in new_subtree in the past. For the Poseidon tweaks implementation we override this with a SIMD packing variant, following the ideas from compute_tree_leaves.

edit:

I now ran the benchmarks with the key gen timing on. I.e. I ran:

RUSTFLAGS="-C target-cpu=native" cargo bench --bench benchmark --features with-gen-benches-poseidon-top-level

(and an initial run without target native on the main branch).

Here is a summary table. For the full output see the folded section below. This is on an AMD Ryzen 9 5950X.

| Configuration                                | Operation | Baseline (SIMD) | PR (SIMD+) |  Change |
|----------------------------------------------+-----------+-----------------+------------+---------|
| *Lifetime 2^8, Dim 64, Base 8*               | gen       | 3.75 ms         | 3.67 ms    |   -2.1% |
|                                              | sign      | 573.73 µs       | 553.89 µs  | *-3.5%* |
|                                              | verify    | 112.52 µs       | 113.79 µs  |   +1.1% |
| *Lifetime 2^18, Dim 64, Base 8*              | gen       | 1.484 s         | 1.496 s    |   +0.8% |
|                                              | sign      | 581.78 µs       | 573.22 µs  |   -1.5% |
|                                              | verify    | 126.95 µs       | 126.02 µs  |   -0.7% |
| *Lifetime 2^32, Dim 64, Base 8 (Hash Opt)*   | gen       | 1.523 s         | 1.516 s    |   -0.5% |
|                                              | sign      | 586.36 µs       | 580.74 µs  |   -1.0% |
|                                              | verify    | 141.40 µs       | 140.28 µs  |   -0.8% |
| *Lifetime 2^32, Dim 48, Base 10 (Trade-off)* | gen       | 1.332 s         | 1.321 s    |   -0.8% |
|                                              | sign      | 598.42 µs       | 565.76 µs  | *-5.5%* |
|                                              | verify    | 158.91 µs       | 156.58 µs  |   -1.5% |
| *Lifetime 2^32, Dim 32, Base 26 (Size Opt)*  | gen       | 1.864 s         | 1.759 s    | *-5.7%* |
|                                              | sign      | 954.53 µs       | 890.99 µs  | *-6.7%* |
|                                              | verify    | 251.19 µs       | 249.33 µs  |   -0.7% |

As expected for this the improvements are only marginal, due to the fact that the hashing part of the Merkle leaves calculation is more extensive than the tree layers.

Baseline SIMD:

Click to expand baseline SIMD results
    Finished `bench` profile [optimized] target(s) in 5.66s
     Running benches/benchmark.rs (target/release/deps/benchmark-ae722d8401ccbb97)
Poseidon: Top Level TS, Lifetime 2^8, Activation 2^18, Dimension 64, Base 8/- gen
                        time:   [3.6866 ms 3.7453 ms 3.8007 ms]
                        change: [−58.827% −57.316% −55.959%] (p = 0.00 < 0.05)
                        Performance has improved.
Poseidon: Top Level TS, Lifetime 2^8, Activation 2^18, Dimension 64, Base 8/- sign
                        time:   [563.45 µs 573.73 µs 584.36 µs]
                        change: [−2.5255% −0.0325% +2.5404%] (p = 0.99 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
Poseidon: Top Level TS, Lifetime 2^8, Activation 2^18, Dimension 64, Base 8/- verify
                        time:   [112.29 µs 112.52 µs 112.79 µs]
                        change: [−8.6570% −8.3984% −8.1439%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  4 (4.00%) high mild
  10 (10.00%) high severe

Benchmarking Poseidon: Top Level TS, Lifetime 2^18, Activation 2^18, Dimension 64, Base 8/- gen: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 14.8s.
Poseidon: Top Level TS, Lifetime 2^18, Activation 2^18, Dimension 64, Base 8/- gen
                        time:   [1.4783 s 1.4839 s 1.4902 s]
                        change: [−78.054% −77.893% −77.740%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) low mild
  1 (10.00%) high mild
Poseidon: Top Level TS, Lifetime 2^18, Activation 2^18, Dimension 64, Base 8/- sign
                        time:   [571.21 µs 581.78 µs 592.84 µs]
                        change: [−3.9528% −1.6531% +0.8354%] (p = 0.18 > 0.05)
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
Poseidon: Top Level TS, Lifetime 2^18, Activation 2^18, Dimension 64, Base 8/- verify
                        time:   [126.85 µs 126.95 µs 127.06 µs]
                        change: [−2.8452% −2.6856% −2.5375%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

Benchmarking Poseidon: Top Level TS, Lifetime 2^32, Activation 2^18, Dimension 64, Base 8 (Hashing Optimized)/- g...: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 15.1s.
Poseidon: Top Level TS, Lifetime 2^32, Activation 2^18, Dimension 64, Base 8 (Hashing Optimized)/- g...
                        time:   [1.5163 s 1.5226 s 1.5297 s]
                        change: [−77.524% −77.414% −77.297%] (p = 0.00 < 0.05)
                        Performance has improved.
Poseidon: Top Level TS, Lifetime 2^32, Activation 2^18, Dimension 64, Base 8 (Hashing Optimized)/- s...
                        time:   [575.66 µs 586.36 µs 597.06 µs]
                        change: [−4.7975% −2.4739% −0.0740%] (p = 0.04 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
Poseidon: Top Level TS, Lifetime 2^32, Activation 2^18, Dimension 64, Base 8 (Hashing Optimized)/- v...
                        time:   [140.79 µs 141.40 µs 142.09 µs]
                        change: [−7.2058% −6.7650% −6.3513%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

Benchmarking Poseidon: Top Level TS, Lifetime 2^32, Activation 2^18, Dimension 48, Base 10 (Trade-off)/- gen: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 13.9s.
Poseidon: Top Level TS, Lifetime 2^32, Activation 2^18, Dimension 48, Base 10 (Trade-off)/- gen
                        time:   [1.3233 s 1.3316 s 1.3399 s]
                        change: [−78.685% −78.534% −78.394%] (p = 0.00 < 0.05)
                        Performance has improved.
Poseidon: Top Level TS, Lifetime 2^32, Activation 2^18, Dimension 48, Base 10 (Trade-off)/- sign
                        time:   [586.28 µs 598.42 µs 611.16 µs]
                        change: [−0.7170% +2.1004% +4.8268%] (p = 0.14 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Poseidon: Top Level TS, Lifetime 2^32, Activation 2^18, Dimension 48, Base 10 (Trade-off)/- verify
                        time:   [158.44 µs 158.91 µs 159.44 µs]
                        change: [−5.0230% −4.7136% −4.3847%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

Benchmarking Poseidon: Top Level TS, Lifetime 2^32, Activation 2^18, Dimension 32, Base 26 (Size Optimized)/- gen: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 18.9s.
Poseidon: Top Level TS, Lifetime 2^32, Activation 2^18, Dimension 32, Base 26 (Size Optimized)/- gen
                        time:   [1.8464 s 1.8643 s 1.8809 s]
                        change: [−82.085% −81.898% −81.731%] (p = 0.00 < 0.05)
                        Performance has improved.
Poseidon: Top Level TS, Lifetime 2^32, Activation 2^18, Dimension 32, Base 26 (Size Optimized)/- sig...
                        time:   [934.21 µs 954.53 µs 974.73 µs]
                        change: [−6.2811% −3.2789% −0.2006%] (p = 0.04 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
Poseidon: Top Level TS, Lifetime 2^32, Activation 2^18, Dimension 32, Base 26 (Size Optimized)/- ver...
                        time:   [250.60 µs 251.19 µs 251.88 µs]
                        change: [−1.3681% −1.1011% −0.8198%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

PR SIMD:

Click to expand PR SIMD results
    Finished `bench` profile [optimized] target(s) in 5.66s
     Running benches/benchmark.rs (target/release/deps/benchmark-ae722d8401ccbb97)
Poseidon: Top Level TS, Lifetime 2^8, Activation 2^18, Dimension 64, Base 8/- gen
                        time:   [3.5737 ms 3.6676 ms 3.7613 ms]
                        change: [−4.7556% −2.0760% +0.7359%] (p = 0.20 > 0.05)
                        No change in performance detected.
Poseidon: Top Level TS, Lifetime 2^8, Activation 2^18, Dimension 64, Base 8/- sign
                        time:   [545.09 µs 553.89 µs 562.84 µs]
                        change: [−5.8278% −3.4585% −1.1206%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Poseidon: Top Level TS, Lifetime 2^8, Activation 2^18, Dimension 64, Base 8/- verify
                        time:   [113.49 µs 113.79 µs 114.16 µs]
                        change: [+0.7713% +1.1337% +1.5420%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

Benchmarking Poseidon: Top Level TS, Lifetime 2^18, Activation 2^18, Dimension 64, Base 8/- gen: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 15.2s.
Poseidon: Top Level TS, Lifetime 2^18, Activation 2^18, Dimension 64, Base 8/- gen
                        time:   [1.4829 s 1.4959 s 1.5096 s]
                        change: [−0.2349% +0.8040% +1.8197%] (p = 0.15 > 0.05)
                        No change in performance detected.
Poseidon: Top Level TS, Lifetime 2^18, Activation 2^18, Dimension 64, Base 8/- sign
                        time:   [562.00 µs 573.22 µs 584.91 µs]
                        change: [−4.1135% −1.4721% +1.2489%] (p = 0.30 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Poseidon: Top Level TS, Lifetime 2^18, Activation 2^18, Dimension 64, Base 8/- verify
                        time:   [125.79 µs 126.02 µs 126.34 µs]
                        change: [−0.9371% −0.7275% −0.4948%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

Benchmarking Poseidon: Top Level TS, Lifetime 2^32, Activation 2^18, Dimension 64, Base 8 (Hashing Optimized)/- g...: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 15.1s.
Poseidon: Top Level TS, Lifetime 2^32, Activation 2^18, Dimension 64, Base 8 (Hashing Optimized)/- g...
                        time:   [1.5088 s 1.5156 s 1.5235 s]
                        change: [−1.1062% −0.4577% +0.1903%] (p = 0.21 > 0.05)
                        No change in performance detected.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
Poseidon: Top Level TS, Lifetime 2^32, Activation 2^18, Dimension 64, Base 8 (Hashing Optimized)/- s...
                        time:   [570.15 µs 580.74 µs 591.83 µs]
                        change: [−3.5380% −0.9571% +1.6316%] (p = 0.48 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Poseidon: Top Level TS, Lifetime 2^32, Activation 2^18, Dimension 64, Base 8 (Hashing Optimized)/- v...
                        time:   [140.00 µs 140.28 µs 140.62 µs]
                        change: [−1.3207% −0.7931% −0.2829%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

Benchmarking Poseidon: Top Level TS, Lifetime 2^32, Activation 2^18, Dimension 48, Base 10 (Trade-off)/- gen: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 13.0s.
Poseidon: Top Level TS, Lifetime 2^32, Activation 2^18, Dimension 48, Base 10 (Trade-off)/- gen
                        time:   [1.3056 s 1.3214 s 1.3418 s]
                        change: [−2.0858% −0.7713% +0.8905%] (p = 0.36 > 0.05)
                        No change in performance detected.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe
Poseidon: Top Level TS, Lifetime 2^32, Activation 2^18, Dimension 48, Base 10 (Trade-off)/- sign
                        time:   [555.94 µs 565.76 µs 575.64 µs]
                        change: [−8.0486% −5.4577% −2.8252%] (p = 0.00 < 0.05)
                        Performance has improved.
Poseidon: Top Level TS, Lifetime 2^32, Activation 2^18, Dimension 48, Base 10 (Trade-off)/- verify
                        time:   [156.31 µs 156.58 µs 156.87 µs]
                        change: [−1.8368% −1.4659% −1.1186%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

Benchmarking Poseidon: Top Level TS, Lifetime 2^32, Activation 2^18, Dimension 32, Base 26 (Size Optimized)/- gen: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 17.4s.
Poseidon: Top Level TS, Lifetime 2^32, Activation 2^18, Dimension 32, Base 26 (Size Optimized)/- gen
                        time:   [1.7389 s 1.7588 s 1.7871 s]
                        change: [−7.0834% −5.6599% −3.9971%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe
Poseidon: Top Level TS, Lifetime 2^32, Activation 2^18, Dimension 32, Base 26 (Size Optimized)/- sig...
                        time:   [871.07 µs 890.99 µs 911.04 µs]
                        change: [−9.4954% −6.6566% −3.7032%] (p = 0.00 < 0.05)
                        Performance has improved.
Poseidon: Top Level TS, Lifetime 2^32, Activation 2^18, Dimension 32, Base 26 (Size Optimized)/- ver...
                        time:   [249.05 µs 249.33 µs 249.65 µs]
                        change: [−1.0350% −0.7428% −0.4663%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe

@Vindaar
Copy link
Contributor Author

Vindaar commented Dec 15, 2025

I just combined @tcoratger's work from here: Vindaar#1 and my attempt to extend that even further from here: tcoratger#1 in a combined commit.

I ran the benchmarks for different cases using:

RUSTFLAGS="-C target-cpu=native" cargo bench --bench benchmark --features with-gen-benches-poseidon-top-level -- --measurement-time 60

The 60s target time is because the 2^32 test case would otherwise only have very few samples. I only ran the very first and last benchmark from here:

https://github.com/leanEthereum/leanSig/blob/main/benches/benchmark_poseidon_top_level.rs#L102-L142

The results are summarized in the table below.

| Commit  | Branch            | Benchmark   |   Time | Unit |  vs Base |
|---------+-------------------+-------------+--------+------+----------|
| d9610e7 | main (baseline)   | 2^8 Gen     | 3.2912 | ms   | 1.000000 |
| d9610e7 | main (baseline)   | 2^8 Sign    | 541.06 | µs   | 1.000000 |
| d9610e7 | main (baseline)   | 2^8 Verify  | 119.55 | µs   | 1.000000 |
| d9610e7 | main (baseline)   | 2^32 Gen    | 1.7126 | s    | 1.000000 |
| d9610e7 | main (baseline)   | 2^32 Sign   | 929.24 | µs   | 1.000000 |
| d9610e7 | main (baseline)   | 2^32 Verify | 267.04 | µs   | 1.000000 |
|---------+-------------------+-------------+--------+------+----------|
| 3cc78ea | packingForTree    | 2^8 Gen     | 3.6427 | ms   | 1.106800 |
| 3cc78ea | packingForTree    | 2^8 Sign    | 540.99 | µs   | 0.999871 |
| 3cc78ea | packingForTree    | 2^8 Verify  | 117.42 | µs   | 0.982183 |
| 3cc78ea | packingForTree    | 2^32 Gen    | 1.7343 | s    | 1.012671 |
| 3cc78ea | packingForTree    | 2^32 Sign   | 920.30 | µs   | 0.990379 |
| 3cc78ea | packingForTree    | 2^32 Verify | 266.54 | µs   | 0.998128 |
|---------+-------------------+-------------+--------+------+----------|
| d448dd8 | experimental-simd | 2^8 Gen     | 3.2979 | ms   | 1.002036 |
| d448dd8 | experimental-simd | 2^8 Sign    | 543.10 | µs   | 1.003770 |
| d448dd8 | experimental-simd | 2^8 Verify  | 119.46 | µs   | 0.999247 |
| d448dd8 | experimental-simd | 2^32 Gen    | 1.7162 | s    | 1.002102 |
| d448dd8 | experimental-simd | 2^32 Sign   | 950.90 | µs   | 1.023309 |
| d448dd8 | experimental-simd | 2^32 Verify | 268.61 | µs   | 1.005879 |
|---------+-------------------+-------------+--------+------+----------|
| c9e234f | packingForTree    | 2^8 Gen     | 3.1320 | ms   | 0.951629 |
| c9e234f | packingForTree    | 2^8 Sign    | 546.26 | µs   | 1.009611 |
| c9e234f | packingForTree    | 2^8 Verify  | 118.19 | µs   | 0.988624 |
| c9e234f | packingForTree    | 2^32 Gen    | 1.6874 | s    | 0.985286 |
| c9e234f | packingForTree    | 2^32 Sign   | 924.01 | µs   | 0.994372 |
| c9e234f | packingForTree    | 2^32 Verify | 273.48 | µs   | 1.024116 |

Essentially all of these are only marginal improvements. The initial packing attempt of mine was a regression for the 2^8 gen case. I think the main improvement in the last commit is because we now also apply the changes to the compute_tree_leaves function.

Copy link
Contributor

@tcoratger tcoratger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just commented on a couple of minor issues, some clippy, and functions that could be removed to make the PR lighter.

Personally, I'm in favor of merging this. Even if it's a marginal gain, given the significant time for key generation involved in the production setup, it could save a few minutes.

I also think it allows us to properly set up our SIMD environment so that in the future, if we find better ways to improve it, the basics are in place and the subsequent work will be less extensive.

I'm still surprised that the gain is so marginal; I was expecting something more in the range of 5-10%. So either there's something we haven't considered yet that I need to think about further, or it's simply the way it is and we can't push it much further.

I'll let @b-wagn give his final opinion on this before I can merge.

/// This vertical packing enables efficient SIMD operations where a single instruction
/// processes the same element position across multiple arrays simultaneously.
#[inline]
#[inline(always)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I think that with clippy you can remove the always, should not make big difference (at least from my personal experiments locally because I've just tried to remove it.

@Vindaar
Copy link
Contributor Author

Vindaar commented Dec 16, 2025

Addressed everything including cargo clippy and cargo fmt.

@tcoratger
Copy link
Contributor

Addressed everything including cargo clippy and cargo fmt.

Thanks a lot, for me this is ready to merge, @b-wagn let me know what you think

Copy link
Contributor

@b-wagn b-wagn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! This is great work. Just one potential non-critical bug that I found.
I also like that you have so many tests ensuring consistency!

Even though I marked it as approved, I still question if we should really merge it, given that it makes the already complex Merkle tree code even more complex, and the efficiency improvements seem very small. But that can be discussed, and I am totally open to merging this one :)

// Hash children into their parent using the tweak
Self::apply(
parameter,
&Self::tree_tweak(level + 1, parent_pos),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this level+1 correct? If I compare with the old code and unpack the function call, this would mean that we actually add 1 to the level twice now, whereas we did only once in the old code? Maybe this did not show up in tests as this default implementation is not used. Can you maybe add some test that also checks this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just checked and above you seem to have just level in the scalar implementation that you use for testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, great catch! Yeah, I copied the code for the default implementation before I decided to just change the API to pass level + 1 directly and didn't update the default.

}
}

fn compute_tree_layer(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function needs documentation. Especially, it is not clear if I should pass level or level+1 here when I use it (see comment below on the default impl).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that should help. Added one. Longer one for the default and a short one for the Poseidon SIMD variant.

@tcoratger tcoratger merged commit b22452d into leanEthereum:main Dec 17, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants