Optim: bit-reversal permutation with cache-blocking #813

mratsim · 2024-08-27T22:18:58Z

Hey team,

I've been looking around the repo and I've seen that you consider bit-reversal permutations important enough to track them in benchmarks.

There is also a mention of cache-friendly algorithm:

stwo/crates/prover/src/core/utils.rs

Lines 136 to 153 in 387a072

    
           /// Performs a naive bit-reversal permutation inplace. 
        
           /// 
        
           /// # Panics 
        
           /// 
        
           /// Panics if the length of the slice is not a power of two. 
        
           // TODO: Implement cache friendly implementation. 
        
           // TODO(spapini): Move this to the cpu backend. 
        
           pub fn bit_reverse<T>(v: &mut [T]) { 
        
               let n = v.len(); 
        
               assert!(n.is_power_of_two()); 
        
               let log_n = n.ilog2(); 
        
               for i in 0..n { 
        
                   let j = bit_reverse_index(i, log_n); 
        
                   if j > i { 
        
                       v.swap(i, j); 
        
                   } 
        
               } 
        
           }

The following is an in-place algorithm that is 33% faster than naive using cache-blocking (on my machine for EIP-4844 size):

https://github.com/mratsim/constantine/blob/65147ed/constantine/math/polynomials/fft.nim#L203-L295

Reference papers

Towards an Optimal Bit-Reversal Permutation Program
Larry Carter and Kang Su Gatlin, 1998
https://csaws.cs.technion.ac.il/~itai/Courses/Cache/bit.pdf
Practically efficient methods for performing bit-reversed
permutation in C++11 on the x86-64 architecture
Knauth, Adas, Whitfield, Wang, Ickler, Conrad, Serang, 2017
https://arxiv.org/pdf/1708.01873.pdf

The performance improvement has been independently confirmed in Gnark Consensys/gnark-crypto#446 on x86 (though it's slower than naive on Apple, probably due to significant memory bandwidth there).

Image courtesy of @gbotrel (amd desktop)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optim: bit-reversal permutation with cache-blocking #813

Optim: bit-reversal permutation with cache-blocking #813

mratsim commented Aug 27, 2024 •

edited

Loading

Optim: bit-reversal permutation with cache-blocking #813

Optim: bit-reversal permutation with cache-blocking #813

Comments

mratsim commented Aug 27, 2024 • edited Loading

mratsim commented Aug 27, 2024 •

edited

Loading