-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Realizing to a crop of a Buffer doesn't work on GPU. #8395
Comments
I think I narrowed it down to the scenario where the buffer does not have a device allocation, but you realize to a crop. The cropped buffer sees there is no device allocation, and thus allocates, but only allocates the crop instead of the full buffer. Also, dirty-bits are not updated on the underlying buffer when the cropped/sliced buffer is made dirty. @abadams Can we discuss this on the dev-meeting? It's failing in many subtle ways; so I think some input will be valuable. |
So, just to be clear: to fix my particular issue I did: // (2) Make buffer for the result
U16_Image denoised_YCoCg(noisy.width(), noisy.height(), noisy.channels());
denoised_YCoCg.device_malloc(halide_cuda_device_interface());
// ...
// (5) Work with the denoised_YCoCg buffer...
denoised_YCoCg.set_device_dirty(true); // Tell the parent buffer that it actually changed! |
Summary of current working:
The issue arises because:
So at least, we should take care of keeping track of which other Buffer a Buffer is a crop, always. Also when there is no device-side memory yet. Additional issue:
|
Conclusion from the dev-meeting: Either we do:
Either way, we need to figure out why we don't see the error that says that the device is still dirty and the crop goes out-of-scope. (1) This raises the question: how would you make clear that if you specify you do NOT want device-side aliasing, and yet a device-allocation already exists, the result crop is still going to be aliasing the parent device buffer. |
@zvookin Managing the dirty-bits is something we haven't discussed yet. I think the starting point would be to modify Halide/src/runtime/HalideRuntime.h Lines 1597 to 1603 in b87f2b1
They would somehow need to propagate this dirty-bit to the parent buffer. However; the link to the parent-buffer --we established-- was going to be through a virtual device-interface. However, this interface has no dirty-bits related functions, so I think we might be stuck again with this approach... |
Here is a repro:
Comment out either of the two lines below the
// Problem
comments to see it failing.Original context for of my use-case. (click to expand)
I'm working on a denoiser and was currently experimenting with denoising in YCoCg colorspace with different filter banks for Y and for Co/Cg. So naturally, I...
This strategy, as far as I understand should work; and it does work on CPU-compiled pipelines (so everything computed on host, and no device copies). I tried this on CUDA, OpenCL, and Vulkan; and it doesn't work on any of those.
The text was updated successfully, but these errors were encountered: