Skip to content

Commit

Permalink
#5 CPU version is faster if guarded by loop ranges
Browse files Browse the repository at this point in the history
  • Loading branch information
carljohnsen committed Mar 7, 2024
1 parent c4cfb1b commit b41402e
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions src/lib/cpp/cpu/diffusion.cc
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,11 @@ namespace cpu_par {

float sum = 0.0f;

for (int64_t r = -radius; r <= radius; r++) {
// Does not add performance:
//#pragma omp simd reduction(+:sum)
for (int64_t r = ranges[0]; r <= ranges[1]; r++) {
const int64_t input_index = output_index + r*stride[dim];
float val = r >= ranges[0] && r <= ranges[1] ? input[input_index] : 0.0f;
float val = input[input_index];
sum += val * kernel[radius+r];
}

Expand Down

0 comments on commit b41402e

Please sign in to comment.