Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optim: bit-reversal permutation with cache-blocking #813

Open
mratsim opened this issue Aug 27, 2024 · 0 comments
Open

Optim: bit-reversal permutation with cache-blocking #813

mratsim opened this issue Aug 27, 2024 · 0 comments

Comments

@mratsim
Copy link

mratsim commented Aug 27, 2024

Hey team,

I've been looking around the repo and I've seen that you consider bit-reversal permutations important enough to track them in benchmarks.

There is also a mention of cache-friendly algorithm:

/// Performs a naive bit-reversal permutation inplace.
///
/// # Panics
///
/// Panics if the length of the slice is not a power of two.
// TODO: Implement cache friendly implementation.
// TODO(spapini): Move this to the cpu backend.
pub fn bit_reverse<T>(v: &mut [T]) {
let n = v.len();
assert!(n.is_power_of_two());
let log_n = n.ilog2();
for i in 0..n {
let j = bit_reverse_index(i, log_n);
if j > i {
v.swap(i, j);
}
}
}

The following is an in-place algorithm that is 33% faster than naive using cache-blocking (on my machine for EIP-4844 size):

https://github.com/mratsim/constantine/blob/65147ed/constantine/math/polynomials/fft.nim#L203-L295

Reference papers

The performance improvement has been independently confirmed in Gnark Consensys/gnark-crypto#446 on x86 (though it's slower than naive on Apple, probably due to significant memory bandwidth there).

Image courtesy of @gbotrel (amd desktop)
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant