utils: faster add_multilinears
#55
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Benchmarks when compared to before (on small dimensions, the very small degradation is noise, I tested various run and the results are kind of different each time but on large dimensions this is always big win).
add_multilinears_fn/optimized/10 time: [50.002 µs 50.549 µs 51.090 µs] thrpt: [20.043 Melem/s 20.258 Melem/s 20.479 Melem/s] change: time: [+1.2641% +3.4338% +5.7785%] (p = 0.00 < 0.05) thrpt: [−5.4629% −3.3198% −1.2483%] Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild add_multilinears_fn/optimized/14 time: [76.380 µs 77.374 µs 78.297 µs] thrpt: [209.25 Melem/s 211.75 Melem/s 214.51 Melem/s] change: time: [+1.7252% +3.9230% +6.2144%] (p = 0.00 < 0.05) thrpt: [−5.8508% −3.7749% −1.6960%] Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild add_multilinears_fn/optimized/18 time: [112.78 µs 117.96 µs 125.47 µs] thrpt: [2.0893 Gelem/s 2.2223 Gelem/s 2.3244 Gelem/s] change: time: [−18.608% −16.871% −14.746%] (p = 0.00 < 0.05) thrpt: [+17.297% +20.294% +22.862%] Performance has improved. Found 9 outliers among 100 measurements (9.00%) 1 (1.00%) low mild 4 (4.00%) high mild 4 (4.00%) high severe add_multilinears_fn/optimized/20 time: [315.30 µs 320.84 µs 329.73 µs] thrpt: [3.1801 Gelem/s 3.2682 Gelem/s 3.3257 Gelem/s] change: time: [−52.253% −50.004% −47.859%] (p = 0.00 < 0.05) thrpt: [+91.787% +100.02% +109.44%] Performance has improved. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe add_multilinears_sparse/optimized/10 time: [88.945 µs 90.308 µs 91.541 µs] thrpt: [715.92 Melem/s 725.70 Melem/s 736.81 Melem/s] change: time: [−2.9823% +0.0009% +2.8453%] (p = 1.00 > 0.05) thrpt: [−2.7666% −0.0009% +3.0740%] No change in performance detected. Found 11 outliers among 100 measurements (11.00%) 7 (7.00%) low mild 3 (3.00%) high mild 1 (1.00%) high severe add_multilinears_sparse/optimized/50 time: [88.014 µs 88.741 µs 89.512 µs] thrpt: [732.15 Melem/s 738.51 Melem/s 744.61 Melem/s] change: time: [−9.0176% −7.0711% −5.2000%] (p = 0.00 < 0.05) thrpt: [+5.4853% +7.6092% +9.9113%] Performance has improved. Found 9 outliers among 100 measurements (9.00%) 4 (4.00%) low mild 2 (2.00%) high mild 3 (3.00%) high severe add_multilinears_sparse/optimized/90 time: [86.818 µs 89.123 µs 92.157 µs] thrpt: [711.13 Melem/s 735.34 Melem/s 754.87 Melem/s] change: time: [−6.2371% −4.5692% −2.7610%] (p = 0.00 < 0.05) thrpt: [+2.8394% +4.7879% +6.6520%] Performance has improved. Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) low mild 1 (1.00%) high mild 3 (3.00%) high severe add_multilinears_sparse/optimized/100 time: [85.672 µs 86.673 µs 87.685 µs] thrpt: [747.40 Melem/s 756.13 Melem/s 764.96 Melem/s] change: time: [−3.8784% −1.3785% +0.8765%] (p = 0.27 > 0.05) thrpt: [−0.8689% +1.3978% +4.0349%] No change in performance detected. Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severeWhen doing this
In order to avoid double allocations:
column_up(&batched_column)add_multilinears,You could do in place operations:
with
should be much more efficient.