Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(iba): IBA::perpixel_op #4299

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

lgritz
Copy link
Collaborator

@lgritz lgritz commented Jun 17, 2024

Inspired by a question by Vlad Erium, I have added a simpler way for C++ users of OIIO to construct IBA-like functions for simple unary and binary operations on ImageBufs where each pixel is independent and based only on the corresponding pixel of the input(s).

The user only needs to supply the contents of the inner loop, i.e. just doing one pixel's work, and only needs to work for float values. All format conversion, sizing and allocation of the destination buffer, looping over pixels, and multithreading is automatic.

If the actual buffers in question are not float-based, conversions will happen automatically, at about a 2x slowdown compared to everything being in float all along, which seems reasonable for the extreme simplicity, especially for use cases where the buffers are fairly likely to be float anyway.

What you pass is a function or lambda that takes spans for the output and input pixel values. Here's an example that adds two images channel by channel, producing a sum image:

// Assume ImageBuf A, B are the inputs, ImageBuf R is the output
R = ImageBufAlgo::perpixel_op(A, B,
        [](span<float> r, cspan<float> a, cspan<float> b) {
            for (size_t c = 0, nc = size_t(r.size()); c < nc; ++c)
                r[c] = a[c] + b[c];
            return true;
        });

This is exactly equivalent to calling

R = ImageBufAlgo::add(A, B);

and for float IB's, it's just as fast.

To make the not-float case fast and not require the DISPATCH macro magic, I needed to change the ImageBuf::Iterator just a bit to add store() and load() method templates to the iterators, and add a field that holds the buffer type. That might make a slight ABI tweak, so I am thinking that I will make this for the upcoming OIIO 3.0, and not backport to the release branch.

I think this is ready to introduce at this time, but I'm also studying whether more varieties of this approach are needed, whether the non-float case can be sped up even more, and whether some of the existing IBA functions should switch to using this internally (good candidates would be those that are almost always performed on float buffers, but for which the heavy template expansion of the DISPATCH approach to handling the full type zoo currently makes them very bloated and expensive to compile, for very little real-world gain).

We should probably consider this to be experimental for a little while, just in case the function signature for this changes as I think about it more or add functionality.

@lgritz
Copy link
Collaborator Author

lgritz commented Jun 17, 2024

I haven't added docs yet, I will do so by the time this is merged. I'm still struggling with exactly where to explain it.

Inspired by a question by Vlad Erium, I have added a simpler way for
C++ users of OIIO to construct IBA-like functions for simple unary and
binary operations on ImageBufs where each pixel is independent and
based only on the corresponding pixel of the input(s).

The user only needs to supply the contents of the inner loop, i.e.
just doing one pixel's work, and only needs to work for float values.
All format conversion, sizing and allocation of the destination
buffer, looping over pixels, and multithreading is automatic.

If the actual buffers in question are not float-based, conversions
will happen automatically, at about a 2x slowdown compared to
everything being in float all along, which seems reasonable for the
extreme simplicity, especially for use cases where the buffers are
fairly likely to be float anyway.

What you pass is a function or lambda that takes spans for the output
and input pixel values. Here's an example that adds two images channel
by channel, producing a sum image:

    // Assume ImageBuf A, B are the inputs, ImageBuf R is the output
    R = ImageBufAlgo::perpixel_op(A, B,
            [](span<float> r, cspan<float> a, cspan<float> b) {
                for (size_t c = 0, nc = size_t(r.size()); c < nc; ++c)
                    r[c] = a[c] + b[c];
                return true;
            });

This is exactly equivalent to calling

    R = ImageBufAlgo::add(A, B);

and for float IB's, it's just as fast.

To make the not-float case fast and not require the DISPATCH macro
magic, I needed to change the ImageBuf::Iterator just a bit to add
store() and load() method templates to the iterators, and add a field
that holds the buffer type. That might make a slight ABI tweak, so I
am thinking that I will make this for the upcoming OIIO 3.0, and not
backport to the release branch.

I think this is ready to introduce at this time, but I'm also studying
whether more varieties of this approach are needed, whether the
non-float case can be sped up even more, and whether some of the
existing IBA functions should switch to using this internally (good
candidates would be those that are almost always performed on float
buffers, but for which the heavy template expansion of the DISPATCH
approach to handling the full type zoo currently makes them very
bloated and expensive to compile, for very little real-world gain).

We should probably consider this to be experimental for a little
while, just in case the function signature for this changes as I think
about it more or add functionality.

Signed-off-by: Larry Gritz <[email protected]>
@lgritz
Copy link
Collaborator Author

lgritz commented Jun 21, 2024

I just pushed an update in which I do a bit of an overhaul of this: I made the perpixel_op functions take a KWArgs options. This subsumes the prepflags and nthreads arguments and leaves room for lots of expansion going forward without needing to change the function call ABI.

I also added new version of IBAPrep (I am seeking a better name, BTW, ideas appreciated) that takes a KWArgs rather than an integer full of bit flags. Again, I think that leaves more room for expansion in the future, including options that aren't just boolean bits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant