Skip to content

Conversation

@tcoratger
Copy link
Contributor

Benchmarks when compared to before (on small dimensions, the very small degradation is noise, I tested various run and the results are kind of different each time but on large dimensions this is always big win).

add_multilinears_fn/optimized/10
                        time:   [50.002 µs 50.549 µs 51.090 µs]
                        thrpt:  [20.043 Melem/s 20.258 Melem/s 20.479 Melem/s]
                 change:
                        time:   [+1.2641% +3.4338% +5.7785%] (p = 0.00 < 0.05)
                        thrpt:  [−5.4629% −3.3198% −1.2483%]
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
add_multilinears_fn/optimized/14
                        time:   [76.380 µs 77.374 µs 78.297 µs]
                        thrpt:  [209.25 Melem/s 211.75 Melem/s 214.51 Melem/s]
                 change:
                        time:   [+1.7252% +3.9230% +6.2144%] (p = 0.00 < 0.05)
                        thrpt:  [−5.8508% −3.7749% −1.6960%]
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
add_multilinears_fn/optimized/18
                        time:   [112.78 µs 117.96 µs 125.47 µs]
                        thrpt:  [2.0893 Gelem/s 2.2223 Gelem/s 2.3244 Gelem/s]
                 change:
                        time:   [−18.608% −16.871% −14.746%] (p = 0.00 < 0.05)
                        thrpt:  [+17.297% +20.294% +22.862%]
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  4 (4.00%) high severe
add_multilinears_fn/optimized/20
                        time:   [315.30 µs 320.84 µs 329.73 µs]
                        thrpt:  [3.1801 Gelem/s 3.2682 Gelem/s 3.3257 Gelem/s]
                 change:
                        time:   [−52.253% −50.004% −47.859%] (p = 0.00 < 0.05)
                        thrpt:  [+91.787% +100.02% +109.44%]
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

add_multilinears_sparse/optimized/10
                        time:   [88.945 µs 90.308 µs 91.541 µs]
                        thrpt:  [715.92 Melem/s 725.70 Melem/s 736.81 Melem/s]
                 change:
                        time:   [−2.9823% +0.0009% +2.8453%] (p = 1.00 > 0.05)
                        thrpt:  [−2.7666% −0.0009% +3.0740%]
                        No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
  7 (7.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe
add_multilinears_sparse/optimized/50
                        time:   [88.014 µs 88.741 µs 89.512 µs]
                        thrpt:  [732.15 Melem/s 738.51 Melem/s 744.61 Melem/s]
                 change:
                        time:   [−9.0176% −7.0711% −5.2000%] (p = 0.00 < 0.05)
                        thrpt:  [+5.4853% +7.6092% +9.9113%]
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe
add_multilinears_sparse/optimized/90
                        time:   [86.818 µs 89.123 µs 92.157 µs]
                        thrpt:  [711.13 Melem/s 735.34 Melem/s 754.87 Melem/s]
                 change:
                        time:   [−6.2371% −4.5692% −2.7610%] (p = 0.00 < 0.05)
                        thrpt:  [+2.8394% +4.7879% +6.6520%]
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  3 (3.00%) high severe
add_multilinears_sparse/optimized/100
                        time:   [85.672 µs 86.673 µs 87.685 µs]
                        thrpt:  [747.40 Melem/s 756.13 Melem/s 764.96 Melem/s]
                 change:
                        time:   [−3.8784% −1.3785% +0.8765%] (p = 0.27 > 0.05)
                        thrpt:  [−0.8689% +1.3978% +4.0349%]
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

When doing this

let batched_column_mixed = add_multilinears(
        &column_up(&batched_column),
        &scale_poly(&column_down(&batched_column), alpha),
    );

In order to avoid double allocations:

  • One vec is allocated as a result of column_up(&batched_column)
  • One vec is allocated as a result of add_multilinears,

You could do in place operations:

let mut col_up = column_up(&batched_column);
let batched_column_mixed = add_multilinears_inplace(
        &col_up
        &scale_poly(&column_down(&batched_column), alpha),
    );

with

pub fn add_multilinears_inplace<F: Field>(dst: &mut [F], src: &[F]) {
    assert_eq!(dst.len(), src.len());

    dst.par_iter_mut()
        .zip(src.par_iter())
        .filter(|(_, b)| !b.is_zero())
        .for_each(|(a, b)| *a += *b);
}

should be much more efficient.

@TomWambsgans
Copy link
Collaborator

I am not sure of the:

           if a.is_zero() {
                *b
            } else if b.is_zero() {
                *a
            }  else {...}

Because this assumes the polynomial is sparse, which in some case we know for sure it's not the case, so I would suggest to remove the case disjonction (assuming not sparse) or alternatively 2 separate functions ?

also, add_multilinears_inplace seems a good idea to me

@tcoratger
Copy link
Contributor Author

pub fn add_multilinears_inplace<F: Field>(dst: &mut [F], src: &[F]) {
    assert_eq!(dst.len(), src.len());

    dst.par_iter_mut()
        .zip(src.par_iter())
        .filter(|(_, b)| !b.is_zero())
        .for_each(|(a, b)| *a += *b);
}

Just replaced by in place version so that this removes all the problems

@TomWambsgans TomWambsgans merged commit 53afb35 into main Sep 23, 2025
3 checks passed
@tcoratger tcoratger deleted the add_multilinears branch September 23, 2025 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants