You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Practically efficient methods for performing bit-reversed
permutation in C++11 on the x86-64 architecture
Knauth, Adas, Whitfield, Wang, Ickler, Conrad, Serang, 2017 https://arxiv.org/pdf/1708.01873.pdf
The performance improvement has been independently confirmed in Gnark Consensys/gnark-crypto#446 on x86 (though it's slower than naive on Apple, probably due to significant memory bandwidth there).
Hey team,
I've been looking around the repo and I've seen that you consider bit-reversal permutations important enough to track them in benchmarks.
There is also a mention of cache-friendly algorithm:
stwo/crates/prover/src/core/utils.rs
Lines 136 to 153 in 387a072
The following is an in-place algorithm that is 33% faster than naive using cache-blocking (on my machine for EIP-4844 size):
https://github.com/mratsim/constantine/blob/65147ed/constantine/math/polynomials/fft.nim#L203-L295
Reference papers
Towards an Optimal Bit-Reversal Permutation Program
Larry Carter and Kang Su Gatlin, 1998
https://csaws.cs.technion.ac.il/~itai/Courses/Cache/bit.pdf
Practically efficient methods for performing bit-reversed
permutation in C++11 on the x86-64 architecture
Knauth, Adas, Whitfield, Wang, Ickler, Conrad, Serang, 2017
https://arxiv.org/pdf/1708.01873.pdf
The performance improvement has been independently confirmed in Gnark Consensys/gnark-crypto#446 on x86 (though it's slower than naive on Apple, probably due to significant memory bandwidth there).
Image courtesy of @gbotrel (amd desktop)
The text was updated successfully, but these errors were encountered: