⚡️ Speed up function trim_mean by 20%
#41
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 20% (0.20x) speedup for
trim_meaninframework/py/flwr/serverapp/strategy/fedtrimmedavg.py⏱️ Runtime :
1.50 milliseconds→1.25 milliseconds(best of130runs)📝 Explanation and details
The optimized code introduces two key performance improvements:
Early return optimization for no-trim cases: Added a
if lowercut == 0:check that directly computesnp.mean(array, axis=axis)when there's no trimming needed. This avoids the expensivenp.partitionoperation entirely whencut_fractionis 0 or rounds down to 0.Simplified array slicing: Replaced the complex slice list creation and tuple conversion (
slice_list = [slice(None)] * atmp.ndim; slice_list[axis] = slice(lowercut, uppercut); atmp[tuple(slice_list)]) with direct slicing (atmp[lowercut:uppercut]). This eliminates the overhead of creating intermediate objects and tuple conversion.Why this leads to speedup:
np.partitionoperation (which took ~30% of original runtime) for no-trim casesTest case performance:
This optimization is particularly effective for test cases with
cut_fraction=0(liketest_trim_mean_basic_1d_no_trim,test_trim_mean_large_array_no_trim) and very small fractions that round to 0 (liketest_trim_mean_large_array_cut_fraction_near_zero). The remaining cases still benefit from the simplified slicing operation.✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-trim_mean-mhcw4ywyand push.