[BUG]: DeviceScan
algorithms load uninitialized/clobbered scratch memory and perform operations on these
#458
Labels
bug
Something isn't working right.
Is this a duplicate?
Type of Bug
Silent Failure
Component
CUB
Describe the bug
I have code where we first call
cub::DeviceRadixSort::SortPairs
, thencub::DeviceScan::InclusiveScan
with a custom functor and thencub::DeviceRadixSort::SortPairs
again. All three invocations carefully use the same scratch memory pool, taking care to only use the exact size they require individually but ensuring that the scratch area is the maximum of all three required temporary storage sizes.When working in debugging mode, an
assert()
within the custom functor keeps on firing, leading me to believe there is a bug in how I was using CUB. After a long debugging sessions trying to build a reproducer, I found thatcub::DeviceScan::InclusiveScan
calls the functor on clobbered memory in the scratch area. Whilst the final output is correct and satisfies all the invariants, I find in odd that CUB calls the functor on clobbered/uninitialized memory.How to Reproduce
I have managed to build a minimal reproducer:
I have tried to make
vecValue
as small possible, but any smaller and the problem doesn't manifest.My compile line:
nvcc -DNDEBUG=1 -Xptxas=-v -std=c++17 -G -g -gencode arch=compute_89,code=sm_89 bug.cu
. Running the code:Expected behavior
The
assert()
s inPaddedSum::operator()
shouldn't fire, since CUB shouldn't invoke the functor on uninitialized memory.Reproduction link
No response
Operating System
Rocky 9.2
nvidia-smi output
NVCC version
The text was updated successfully, but these errors were encountered: