-
Notifications
You must be signed in to change notification settings - Fork 9
SIMD packing for tweak hash tree #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I just combined @tcoratger's work from here: Vindaar#1 and my attempt to extend that even further from here: tcoratger#1 in a combined commit. I ran the benchmarks for different cases using: The 60s target time is because the 2^32 test case would otherwise only have very few samples. I only ran the very first and last benchmark from here: https://github.com/leanEthereum/leanSig/blob/main/benches/benchmark_poseidon_top_level.rs#L102-L142 The results are summarized in the table below. Essentially all of these are only marginal improvements. The initial packing attempt of mine was a regression for the 2^8 gen case. I think the main improvement in the last commit is because we now also apply the changes to the |
tcoratger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just commented on a couple of minor issues, some clippy, and functions that could be removed to make the PR lighter.
Personally, I'm in favor of merging this. Even if it's a marginal gain, given the significant time for key generation involved in the production setup, it could save a few minutes.
I also think it allows us to properly set up our SIMD environment so that in the future, if we find better ways to improve it, the basics are in place and the subsequent work will be less extensive.
I'm still surprised that the gain is so marginal; I was expecting something more in the range of 5-10%. So either there's something we haven't considered yet that I need to think about further, or it's simply the way it is and we can't push it much further.
I'll let @b-wagn give his final opinion on this before I can merge.
src/simd_utils.rs
Outdated
| /// This vertical packing enables efficient SIMD operations where a single instruction | ||
| /// processes the same element position across multiple arrays simultaneously. | ||
| #[inline] | ||
| #[inline(always)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I think that with clippy you can remove the always, should not make big difference (at least from my personal experiments locally because I've just tried to remove it.
No, I do not prefer
```
for (j, <item>) in output.iter_mut().enumerate().take(PackedF::WIDTH){
```
thank you, Clippy.
Was removed in tcoratger@e8a7273 and I overlooked it while squashing a bunch of our commits.
|
Addressed everything including |
Thanks a lot, for me this is ready to merge, @b-wagn let me know what you think |
b-wagn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! This is great work. Just one potential non-critical bug that I found.
I also like that you have so many tests ensuring consistency!
Even though I marked it as approved, I still question if we should really merge it, given that it makes the already complex Merkle tree code even more complex, and the efficiency improvements seem very small. But that can be discussed, and I am totally open to merging this one :)
src/symmetric/tweak_hash.rs
Outdated
| // Hash children into their parent using the tweak | ||
| Self::apply( | ||
| parameter, | ||
| &Self::tree_tweak(level + 1, parent_pos), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this level+1 correct? If I compare with the old code and unpack the function call, this would mean that we actually add 1 to the level twice now, whereas we did only once in the old code? Maybe this did not show up in tests as this default implementation is not used. Can you maybe add some test that also checks this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just checked and above you seem to have just level in the scalar implementation that you use for testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, great catch! Yeah, I copied the code for the default implementation before I decided to just change the API to pass level + 1 directly and didn't update the default.
| } | ||
| } | ||
|
|
||
| fn compute_tree_layer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function needs documentation. Especially, it is not clear if I should pass level or level+1 here when I use it (see comment below on the default impl).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that should help. Added one. Longer one for the default and a short one for the Poseidon SIMD variant.
Implements
compute_tree_levelas a trait function of theTweakableHash, which defaults to the scalar implementation used innew_subtreein the past. For the Poseidon tweaks implementation we override this with a SIMD packing variant, following the ideas fromcompute_tree_leaves.edit:
I now ran the benchmarks with the key gen timing on. I.e. I ran:
(and an initial run without target native on the main branch).
Here is a summary table. For the full output see the folded section below. This is on an AMD Ryzen 9 5950X.
As expected for this the improvements are only marginal, due to the fact that the hashing part of the Merkle leaves calculation is more extensive than the tree layers.
Baseline SIMD:
Click to expand baseline SIMD results
PR SIMD:
Click to expand PR SIMD results