How to optimize the count of non-zero elements in a buffer? #5245

oddkiva · 2020-09-04T13:56:16Z

oddkiva
Sep 4, 2020

Hello Halide team,

I have been exploring the possibility to port my old CPU SIFT implementation to pure Halide code on a GPU. In my current experiment, as far as the keypoint localization is concerned, I was able to get something like 20x speedup on my macbook air (and I am very happy about that).

There is one improvement I would like to make: the count of local extrema on GPU using Halide.
This amounts to write the equivalent of numpy.count_nonzero or std::count_if before I carry on with stream compaction on the GPU. Which diverges a bit from the use cases that the docs are illustrating.

I wrote a simple generator like this:

// Inside `void generate()`
//
// `f` is 4D buffer of std::int8_t that store a batch of `n` dense maps of local
// scale-space extrema f(x, y, s).
//
// A position (x, y, s) is marked as:
// - `+1` if it is a local scale-space  maximum
// - `-1` if it is a local scale-space  minimum
// - `0` otherwise.

const auto& w = f.dim(0).extent();
const auto& h = f.dim(1).extent();
const auto& c = f.dim(2).extent();
const auto& n = f.dim(3).extent();

auto r = RDom(0, w, 0, h, 0, c, 0, n);

auto nonzero = select(           //
    f(r.x, r.y, r.z, r.w) != 0,  //
    std::int32_t{1},             //
    std::int32_t{0});
out() = sum(nonzero);

This naive generator was the fastest code I could get actually. I tried using the rfactor following the tutorial but rfactor made things slower. I may be using it wrong.

As a reference, on my Macbook Air, the generated Halide code takes 130 ms to count extrema. The STL function std::count_if takes about 2 ms.

I would love your input.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to optimize the count of non-zero elements in a buffer? #5245

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

How to optimize the count of non-zero elements in a buffer? #5245

oddkiva Sep 4, 2020

Replies: 0 comments

oddkiva
Sep 4, 2020