Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: DeviceScan algorithms load uninitialized/clobbered scratch memory and perform operations on these #458

Open
1 task done
pb-dseifert opened this issue Sep 19, 2023 · 2 comments
Labels
bug Something isn't working right.

Comments

@pb-dseifert
Copy link
Contributor

pb-dseifert commented Sep 19, 2023

Is this a duplicate?

Type of Bug

Silent Failure

Component

CUB

Describe the bug

I have code where we first call cub::DeviceRadixSort::SortPairs, then cub::DeviceScan::InclusiveScan with a custom functor and then cub::DeviceRadixSort::SortPairs again. All three invocations carefully use the same scratch memory pool, taking care to only use the exact size they require individually but ensuring that the scratch area is the maximum of all three required temporary storage sizes.

When working in debugging mode, an assert() within the custom functor keeps on firing, leading me to believe there is a bug in how I was using CUB. After a long debugging sessions trying to build a reproducer, I found that cub::DeviceScan::InclusiveScan calls the functor on clobbered memory in the scratch area. Whilst the final output is correct and satisfies all the invariants, I find in odd that CUB calls the functor on clobbered/uninitialized memory.

How to Reproduce

I have managed to build a minimal reproducer:

#undef NDEBUG

#include <cassert>
#include <cstdint>
#include <cstdio>
#include <cstdlib>
#include <vector>

#include <cub/cub.cuh>

struct alignas(std::uint64_t) Value
{
    std::int32_t Start;
    std::int32_t Length;
};

struct PaddedSum
{
    __device__ Value operator()(const Value& lhs, const Value& rhs) const noexcept
    {
        assert(lhs.Start % 32 == 0);
        assert(rhs.Start % 32 == 0);

        Value result = rhs;
        result.Start += (lhs.Start + lhs.Length + 31) & ~31;  // make .Start a multiple of 32
        return result;
    }
};

#define gpuErrchk(ans)                        \
    {                                         \
        gpuAssert((ans), __FILE__, __LINE__); \
    }
inline void gpuAssert(cudaError_t code, const char* file, int line, bool abort = true)
{
    if (code != cudaSuccess) {
        std::fprintf(stderr, "GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
        if (abort) std::exit(code);
    }
}

int main()
{
    std::vector<Value> vecValue = {
        {0, 57}, {0, 29}, {0, 49}, {0, 41}, {0, 39}, {0, 51}, {0, 57}, {0, 40}, {0, 35}, {0, 54},
        {0, 49}, {0, 39}, {0, 44}, {0, 50}, {0, 42}, {0, 51}, {0, 46}, {0, 44}, {0, 35}, {0, 54},
        {0, 53}, {0, 37}, {0, 40}, {0, 29}, {0, 33}, {0, 45}, {0, 41}, {0, 41}, {0, 29}, {0, 48},
        {0, 44}, {0, 50}, {0, 52}, {0, 48}, {0, 47}, {0, 41}, {0, 57}, {0, 63}, {0, 41}, {0, 36},
        {0, 43}, {0, 46}, {0, 45}, {0, 50}, {0, 50}, {0, 29}, {0, 37}, {0, 48}, {0, 38}, {0, 57},
        {0, 56}, {0, 56}, {0, 45}, {0, 26}, {0, 50}, {0, 43}, {0, 43}, {0, 39}, {0, 53}, {0, 27},
        {0, 61}, {0, 40}, {0, 41}, {0, 49}, {0, 30}, {0, 33}, {0, 49}, {0, 43}, {0, 41}, {0, 35},
        {0, 52}, {0, 39}, {0, 54}, {0, 31}, {0, 41}, {0, 35}, {0, 42}, {0, 45}, {0, 50}, {0, 56},
        {0, 31}, {0, 48}, {0, 45}, {0, 54}, {0, 41}, {0, 35}, {0, 55}, {0, 51}, {0, 37}, {0, 54},
        {0, 32}, {0, 62}, {0, 45}, {0, 44}, {0, 44}, {0, 33}, {0, 25}, {0, 33}, {0, 33}, {0, 36},
        {0, 41}, {0, 46}, {0, 45}, {0, 49}, {0, 36}, {0, 34}, {0, 35}, {0, 45}, {0, 29}, {0, 51},
        {0, 27}, {0, 45}, {0, 37}, {0, 33}, {0, 37}, {0, 45}, {0, 39}, {0, 23}, {0, 40}, {0, 50},
        {0, 52}, {0, 50}, {0, 54}, {0, 50}, {0, 28}, {0, 39}, {0, 54}, {0, 39}, {0, 45}, {0, 49},
        {0, 33}, {0, 47}, {0, 41}, {0, 32}, {0, 61}, {0, 44}, {0, 52}, {0, 46}, {0, 43}, {0, 33},
        {0, 32}, {0, 42}, {0, 33}, {0, 48}, {0, 42}, {0, 29}, {0, 51}, {0, 46}, {0, 50}, {0, 50},
        {0, 38}, {0, 44}, {0, 43}, {0, 45}, {0, 38}, {0, 39}, {0, 33}, {0, 49}, {0, 47}, {0, 35},
        {0, 35}, {0, 45}, {0, 46}, {0, 57}, {0, 60}, {0, 37}, {0, 37}, {0, 31}, {0, 49}, {0, 54},
        {0, 46}, {0, 52}, {0, 48}, {0, 38}, {0, 48}, {0, 47}, {0, 43}, {0, 48}, {0, 52}, {0, 52},
        {0, 45}, {0, 52}, {0, 43}, {0, 38}, {0, 35}, {0, 50}, {0, 53}, {0, 35}, {0, 32}, {0, 66},
        {0, 48}, {0, 40}, {0, 47}, {0, 56}, {0, 53}, {0, 51}, {0, 38}, {0, 38}, {0, 53}, {0, 35},
        {0, 55}, {0, 41}, {0, 48}, {0, 47}, {0, 35}, {0, 53}, {0, 59}, {0, 51}, {0, 49}, {0, 45},
        {0, 49}, {0, 45}, {0, 49}, {0, 37}, {0, 55}, {0, 45}, {0, 43}, {0, 43}, {0, 38}, {0, 48},
        {0, 47}, {0, 43}, {0, 45}, {0, 47}, {0, 49}, {0, 58}, {0, 46}, {0, 55}, {0, 34}, {0, 47},
        {0, 43}, {0, 39}, {0, 39}, {0, 50}, {0, 43}, {0, 48}, {0, 47}, {0, 50}, {0, 52}, {0, 47},
        {0, 60}, {0, 38}, {0, 35}, {0, 45}, {0, 51}, {0, 35}, {0, 29}, {0, 32}, {0, 36}, {0, 57},
        {0, 54}, {0, 50}, {0, 41}, {0, 27}, {0, 36}, {0, 52}, {0, 29}, {0, 46}, {0, 31}, {0, 37},
        {0, 47}, {0, 57}, {0, 51}, {0, 35}, {0, 33}, {0, 49}, {0, 41}, {0, 33}, {0, 49}, {0, 41},
        {0, 52}, {0, 38}, {0, 44}, {0, 47}, {0, 45}, {0, 53}, {0, 39}, {0, 42}, {0, 50}, {0, 39},
        {0, 59}, {0, 29}, {0, 29}, {0, 61}, {0, 37}, {0, 33}, {0, 47}, {0, 60}, {0, 36}, {0, 31},
        {0, 45}, {0, 49}, {0, 51}, {0, 54}, {0, 46}, {0, 55}, {0, 38}, {0, 39}, {0, 45}, {0, 47},
        {0, 39}, {0, 47}, {0, 43}, {0, 36}, {0, 39}, {0, 40}, {0, 41}, {0, 47}, {0, 55}, {0, 48},
        {0, 33}, {0, 31}, {0, 63}, {0, 35}, {0, 32}, {0, 46}, {0, 45}, {0, 41}, {0, 49}, {0, 55},
        {0, 48}, {0, 43}, {0, 29}, {0, 41}, {0, 34}, {0, 37}, {0, 45}, {0, 59}, {0, 37}, {0, 53},
        {0, 45}, {0, 33}, {0, 42}, {0, 42}, {0, 47}, {0, 41}, {0, 37}, {0, 45}, {0, 49}, {0, 41},
        {0, 49}, {0, 45}, {0, 44}, {0, 61}, {0, 59}, {0, 50}, {0, 33}, {0, 43}, {0, 36}, {0, 56},
        {0, 33}, {0, 43}, {0, 43}, {0, 62}, {0, 33}, {0, 38}, {0, 47}, {0, 42}, {0, 25}, {0, 55},
        {0, 61}, {0, 44}, {0, 39}, {0, 37}, {0, 43}, {0, 35}, {0, 58}, {0, 63}, {0, 47}, {0, 33},
        {0, 38}, {0, 57}, {0, 47}, {0, 42}, {0, 43}, {0, 43}, {0, 33}, {0, 48}, {0, 48}, {0, 31},
        {0, 35}, {0, 33}, {0, 47}, {0, 51}, {0, 38}, {0, 41}, {0, 55}, {0, 36}, {0, 46}, {0, 39},
        {0, 27}, {0, 40}, {0, 45}, {0, 47}, {0, 58}, {0, 40}, {0, 54}, {0, 41}, {0, 39}, {0, 41},
        {0, 52}, {0, 35}, {0, 36}, {0, 50}, {0, 49}, {0, 33}, {0, 45}, {0, 33}, {0, 40}, {0, 40},
        {0, 39}, {0, 58}, {0, 36}, {0, 52}, {0, 60}, {0, 39}, {0, 36}, {0, 46}, {0, 42}, {0, 49},
        {0, 15}, {0, 48}, {0, 41}, {0, 47}, {0, 51}, {0, 48}, {0, 40}, {0, 45}, {0, 35}, {0, 37},
        {0, 34}, {0, 43}, {0, 50}, {0, 43}, {0, 32}, {0, 54}, {0, 45}, {0, 44}, {0, 36}, {0, 51},
        {0, 47}, {0, 37}, {0, 31}, {0, 40}, {0, 35}, {0, 51}, {0, 37}, {0, 53}, {0, 55}, {0, 37},
        {0, 53}, {0, 35}, {0, 50}, {0, 48}, {0, 42}, {0, 34}, {0, 49}, {0, 42}, {0, 64}, {0, 40},
        {0, 37}, {0, 44}, {0, 56}, {0, 49}, {0, 47}, {0, 43}, {0, 55}, {0, 44}, {0, 44}, {0, 46},
        {0, 21}, {0, 48}, {0, 54}, {0, 49}, {0, 37}, {0, 45}, {0, 60}, {0, 42}, {0, 49}, {0, 48},
        {0, 49}, {0, 45}, {0, 44}, {0, 37}, {0, 53}, {0, 63}, {0, 44}, {0, 31}, {0, 39}, {0, 42},
        {0, 56}, {0, 46}, {0, 41}, {0, 52}, {0, 46}, {0, 50}, {0, 31}, {0, 35}, {0, 47}, {0, 60},
        {0, 33}, {0, 39}, {0, 40}, {0, 49}, {0, 46}, {0, 51}, {0, 48}, {0, 55}, {0, 45}, {0, 27},
        {0, 43}, {0, 39}, {0, 40}, {0, 33}, {0, 49}, {0, 35}, {0, 48}, {0, 33}, {0, 45}, {0, 61},
        {0, 42}, {0, 44}, {0, 38}, {0, 37}, {0, 49}, {0, 35}, {0, 50}, {0, 31}, {0, 49}, {0, 37},
        {0, 46}, {0, 45}, {0, 43}, {0, 35}, {0, 43}, {0, 65}, {0, 36}, {0, 39}, {0, 51}, {0, 34},
        {0, 41}, {0, 43}, {0, 40}, {0, 35}, {0, 54}, {0, 51}, {0, 35}, {0, 36}, {0, 42}, {0, 42},
        {0, 39}, {0, 37}, {0, 41}, {0, 56}, {0, 48}, {0, 37}, {0, 38}, {0, 40}, {0, 48}, {0, 48},
        {0, 26}, {0, 47}, {0, 48}, {0, 51}, {0, 46}, {0, 48}, {0, 46}, {0, 38}, {0, 43}, {0, 41},
        {0, 47}, {0, 36}, {0, 56}, {0, 38}, {0, 49}, {0, 40}, {0, 63}, {0, 35}, {0, 49}, {0, 44},
        {0, 47}, {0, 40}, {0, 35}, {0, 51}, {0, 31}, {0, 37}, {0, 52}, {0, 37}, {0, 59}, {0, 29},
        {0, 49}, {0, 31}, {0, 37}, {0, 54}, {0, 49}, {0, 35}, {0, 44}, {0, 37}, {0, 41}, {0, 48},
        {0, 41}, {0, 47}, {0, 48}, {0, 31}, {0, 43}, {0, 47}, {0, 53}, {0, 47}, {0, 39}, {0, 46},
        {0, 53}, {0, 55}, {0, 41}, {0, 37}, {0, 35}, {0, 43}, {0, 49}, {0, 46}, {0, 45}, {0, 37},
        {0, 43}, {0, 51}, {0, 52}, {0, 63}, {0, 54}, {0, 41}, {0, 41}, {0, 42}, {0, 58}, {0, 37},
        {0, 44}, {0, 35}, {0, 47}, {0, 54}, {0, 39}, {0, 51}, {0, 34}, {0, 53}, {0, 66}, {0, 45},
        {0, 39}, {0, 43}, {0, 35}, {0, 43}, {0, 46}, {0, 52}, {0, 54}, {0, 41}, {0, 41}, {0, 43},
        {0, 40}, {0, 47}, {0, 45}, {0, 45}, {0, 57}, {0, 38}, {0, 41}, {0, 27}, {0, 57}, {0, 41},
        {0, 48}, {0, 35}, {0, 49}, {0, 34}, {0, 53}, {0, 36}, {0, 53}, {0, 54}, {0, 40}, {0, 35},
        {0, 25}, {0, 38}, {0, 43}, {0, 54}, {0, 45}, {0, 58}, {0, 49}, {0, 44}, {0, 49}, {0, 45},
        {0, 39}, {0, 39}, {0, 40}, {0, 51}, {0, 38}, {0, 43}, {0, 40}, {0, 50}, {0, 35}, {0, 44},
        {0, 45}, {0, 32}, {0, 46}, {0, 30}, {0, 43}, {0, 60}, {0, 49}, {0, 35}, {0, 23}, {0, 35},
        {0, 49}, {0, 45}, {0, 31}, {0, 45}, {0, 47}, {0, 64}, {0, 40}, {0, 63}, {0, 43}, {0, 48},
        {0, 43}, {0, 44}, {0, 53}, {0, 57}, {0, 44}, {0, 57}, {0, 48}, {0, 46}, {0, 46}, {0, 47},
        {0, 52}, {0, 59}, {0, 49}, {0, 51}, {0, 59}, {0, 39}, {0, 41}, {0, 46}, {0, 43}, {0, 43},
        {0, 46}, {0, 44}, {0, 46}, {0, 37}, {0, 45}, {0, 45}, {0, 39}, {0, 45}, {0, 41}, {0, 32},
        {0, 40}, {0, 45}, {0, 39}, {0, 43}, {0, 33}, {0, 64}, {0, 33}, {0, 42}, {0, 47}, {0, 43},
        {0, 47}, {0, 56}, {0, 45}, {0, 41}, {0, 40}, {0, 41}, {0, 45}, {0, 21}, {0, 61}, {0, 42},
        {0, 39}, {0, 60}, {0, 35}, {0, 48}, {0, 47}, {0, 49}, {0, 37}, {0, 32}, {0, 39}, {0, 45},
        {0, 43}, {0, 51}, {0, 47}, {0, 35}, {0, 39}, {0, 47}, {0, 50}, {0, 40}, {0, 43}, {0, 44},
        {0, 54}, {0, 41}, {0, 57}, {0, 38}, {0, 51}, {0, 43}, {0, 29}, {0, 40}, {0, 51}, {0, 43},
        {0, 39}, {0, 40}, {0, 35}, {0, 23}, {0, 49}, {0, 34}, {0, 52}, {0, 35}, {0, 35}, {0, 33},
        {0, 42}, {0, 36}, {0, 43}, {0, 31}, {0, 49}, {0, 37}, {0, 57}, {0, 41}, {0, 45}, {0, 47},
        {0, 56}, {0, 38}, {0, 47}, {0, 38}, {0, 41}, {0, 45}, {0, 41}, {0, 40}, {0, 43}, {0, 55},
        {0, 52}, {0, 49}, {0, 55}, {0, 35}, {0, 39}, {0, 28}, {0, 65}, {0, 49}, {0, 53}, {0, 42},
        {0, 47}, {0, 43}, {0, 41}, {0, 48}, {0, 53}, {0, 35}, {0, 43}, {0, 43}, {0, 41}, {0, 35},
        {0, 43}, {0, 40}, {0, 39}, {0, 46}, {0, 53}, {0, 37}, {0, 37}, {0, 47}, {0, 39}, {0, 43},
        {0, 53}, {0, 48}, {0, 54}, {0, 29}, {0, 53}, {0, 50}, {0, 44}, {0, 40}, {0, 47}, {0, 47},
        {0, 50}, {0, 44}, {0, 35}, {0, 56}, {0, 34}, {0, 42}, {0, 47}, {0, 47}, {0, 43}, {0, 36},
        {0, 54}, {0, 40}, {0, 38}, {0, 45}, {0, 41}, {0, 44}, {0, 41}, {0, 49}, {0, 40}, {0, 41},
        {0, 37}, {0, 43}, {0, 36}, {0, 48}, {0, 55}, {0, 43}, {0, 54}, {0, 54}, {0, 48}, {0, 38},
        {0, 54}, {0, 39}, {0, 56}, {0, 51}, {0, 43}, {0, 26}, {0, 43}, {0, 44}, {0, 41}, {0, 56},
        {0, 28}, {0, 37}, {0, 45}, {0, 37}, {0, 53}, {0, 35}, {0, 53}, {0, 43}, {0, 44}, {0, 31},
        {0, 47}, {0, 38}, {0, 43}, {0, 47}, {0, 44}, {0, 40}, {0, 51}, {0, 46}, {0, 47}, {0, 29},
        {0, 33}, {0, 60}, {0, 35}, {0, 35}, {0, 39}, {0, 49}, {0, 47}, {0, 52}, {0, 55}, {0, 37},
        {0, 41}, {0, 43}, {0, 47}, {0, 44}, {0, 31}, {0, 44}, {0, 34}, {0, 45}, {0, 41}, {0, 51},
        {0, 40}, {0, 35}, {0, 50}, {0, 31}, {0, 49}, {0, 46}, {0, 49}, {0, 32}, {0, 45}, {0, 35},
        {0, 45}, {0, 46}, {0, 37}, {0, 55}, {0, 50}, {0, 43}, {0, 21}, {0, 38}, {0, 42}, {0, 35},
        {0, 45}, {0, 34}, {0, 27}, {0, 41}, {0, 68}, {0, 56}, {0, 38}, {0, 39}, {0, 47}, {0, 49},
        {0, 52}, {0, 43}, {0, 36}, {0, 42}, {0, 47}, {0, 42}, {0, 51}, {0, 31}, {0, 35}, {0, 33},
        {0, 35}, {0, 36}, {0, 45}, {0, 48}, {0, 39}, {0, 29}, {0, 54}, {0, 44}, {0, 41}, {0, 44},
        {0, 43}, {0, 43}, {0, 49}, {0, 45}, {0, 45}, {0, 46}, {0, 57}, {0, 33}, {0, 39}, {0, 37},
        {0, 44}, {0, 45}, {0, 41}, {0, 45}, {0, 33}, {0, 40}, {0, 40}, {0, 37}, {0, 39}, {0, 44},
        {0, 52}, {0, 44}, {0, 49}, {0, 31}, {0, 37}, {0, 34}, {0, 39}, {0, 49}, {0, 47}, {0, 31},
        {0, 41}, {0, 33}, {0, 51}, {0, 36}, {0, 53}, {0, 45}, {0, 51}, {0, 39}, {0, 46}, {0, 37},
    };

    // input is correct
    for (const auto [Start, Length] : vecValue) {
        assert(Start % 32 == 0);
    }

    const std::int32_t numItems = vecValue.size();
    const std::size_t memBytes = sizeof(Value) * numItems;

    cudaSetDevice(0);
    gpuErrchk(cudaPeekAtLastError());

    cudaStream_t stream;
    cudaStreamCreateWithFlags(&stream, cudaStreamNonBlocking);
    gpuErrchk(cudaPeekAtLastError());

    Value* devPtr{};
    cudaMalloc(&devPtr, memBytes);
    gpuErrchk(cudaPeekAtLastError());

    std::size_t storageBytes{};
    PaddedSum op;
    cub::DeviceScan::InclusiveScan(nullptr, storageBytes, devPtr, op, numItems, stream);

    void* storageCub{};
    cudaMalloc(&storageCub, storageBytes);
    gpuErrchk(cudaPeekAtLastError());

    cudaMemcpyAsync(devPtr, vecValue.data(), memBytes, cudaMemcpyDefault, stream);
    gpuErrchk(cudaPeekAtLastError());

    // This line is essential for triggering the bug! Comment and the bug goes away
    cudaMemsetAsync(storageCub, 78, storageBytes, stream);

    cub::DeviceScan::InclusiveScan(storageCub, storageBytes, devPtr, op, numItems, stream);
    gpuErrchk(cudaPeekAtLastError());

    cudaMemcpyAsync(vecValue.data(), devPtr, memBytes, cudaMemcpyDefault, stream);
    gpuErrchk(cudaPeekAtLastError());

    cudaStreamSynchronize(stream);
    gpuErrchk(cudaPeekAtLastError());

    // output is still correct
    for (const auto [Start, Length] : vecValue) {
        assert(Start % 32 == 0);
    }

    cudaFree(devPtr);
    gpuErrchk(cudaPeekAtLastError());

    cudaFree(storageCub);
    gpuErrchk(cudaPeekAtLastError());

    cudaStreamDestroy(stream);
    gpuErrchk(cudaPeekAtLastError());
}

I have tried to make vecValue as small possible, but any smaller and the problem doesn't manifest.

My compile line: nvcc -DNDEBUG=1 -Xptxas=-v -std=c++17 -G -g -gencode arch=compute_89,code=sm_89 bug.cu. Running the code:

$ ./a.out 
bug.cu:21: Value PaddedSum::operator()(const Value &, const Value &) const noexcept: block: [1,0,0], thread: [1,0,0] Assertion `lhs.Start % 32 == 0` failed.
bug.cu:21: Value PaddedSum::operator()(const Value &, const Value &) const noexcept: block: [1,0,0], thread: [2,0,0] Assertion `lhs.Start % 32 == 0` failed.
bug.cu:21: Value PaddedSum::operator()(const Value &, const Value &) const noexcept: block: [1,0,0], thread: [3,0,0] Assertion `lhs.Start % 32 == 0` failed.
bug.cu:21: Value PaddedSum::operator()(const Value &, const Value &) const noexcept: block: [1,0,0], thread: [4,0,0] Assertion `lhs.Start % 32 == 0` failed.
bug.cu:21: Value PaddedSum::operator()(const Value &, const Value &) const noexcept: block: [1,0,0], thread: [5,0,0] Assertion `lhs.Start % 32 == 0` failed.
bug.cu:21: Value PaddedSum::operator()(const Value &, const Value &) const noexcept: block: [1,0,0], thread: [6,0,0] Assertion `lhs.Start % 32 == 0` failed.
bug.cu:21: Value PaddedSum::operator()(const Value &, const Value &) const noexcept: block: [1,0,0], thread: [7,0,0] Assertion `lhs.Start % 32 == 0` failed.
bug.cu:21: Value PaddedSum::operator()(const Value &, const Value &) const noexcept: block: [1,0,0], thread: [8,0,0] Assertion `lhs.Start % 32 == 0` failed.
bug.cu:21: Value PaddedSum::operator()(const Value &, const Value &) const noexcept: block: [1,0,0], thread: [9,0,0] Assertion `lhs.Start % 32 == 0` failed.
bug.cu:21: Value PaddedSum::operator()(const Value &, const Value &) const noexcept: block: [1,0,0], thread: [10,0,0] Assertion `lhs.Start % 32 == 0` failed.
bug.cu:21: Value PaddedSum::operator()(const Value &, const Value &) const noexcept: block: [1,0,0], thread: [11,0,0] Assertion `lhs.Start % 32 == 0` failed.
bug.cu:21: Value PaddedSum::operator()(const Value &, const Value &) const noexcept: block: [1,0,0], thread: [12,0,0] Assertion `lhs.Start % 32 == 0` failed.
bug.cu:21: Value PaddedSum::operator()(const Value &, const Value &) const noexcept: block: [1,0,0], thread: [13,0,0] Assertion `lhs.Start % 32 == 0` failed.
bug.cu:21: Value PaddedSum::operator()(const Value &, const Value &) const noexcept: block: [1,0,0], thread: [14,0,0] Assertion `lhs.Start % 32 == 0` failed.
bug.cu:21: Value PaddedSum::operator()(const Value &, const Value &) const noexcept: block: [1,0,0], thread: [15,0,0] Assertion `lhs.Start % 32 == 0` failed.
bug.cu:21: Value PaddedSum::operator()(const Value &, const Value &) const noexcept: block: [1,0,0], thread: [16,0,0] Assertion `lhs.Start % 32 == 0` failed.
bug.cu:21: Value PaddedSum::operator()(const Value &, const Value &) const noexcept: block: [1,0,0], thread: [17,0,0] Assertion `lhs.Start % 32 == 0` failed.
bug.cu:21: Value PaddedSum::operator()(const Value &, const Value &) const noexcept: block: [1,0,0], thread: [18,0,0] Assertion `lhs.Start % 32 == 0` failed.
bug.cu:21: Value PaddedSum::operator()(const Value &, const Value &) const noexcept: block: [1,0,0], thread: [19,0,0] Assertion `lhs.Start % 32 == 0` failed.
bug.cu:21: Value PaddedSum::operator()(const Value &, const Value &) const noexcept: block: [1,0,0], thread: [20,0,0] Assertion `lhs.Start % 32 == 0` failed.
bug.cu:21: Value PaddedSum::operator()(const Value &, const Value &) const noexcept: block: [1,0,0], thread: [21,0,0] Assertion `lhs.Start % 32 == 0` failed.
bug.cu:21: Value PaddedSum::operator()(const Value &, const Value &) const noexcept: block: [1,0,0], thread: [22,0,0] Assertion `lhs.Start % 32 == 0` failed.
bug.cu:21: Value PaddedSum::operator()(const Value &, const Value &) const noexcept: block: [1,0,0], thread: [23,0,0] Assertion `lhs.Start % 32 == 0` failed.
bug.cu:21: Value PaddedSum::operator()(const Value &, const Value &) const noexcept: block: [1,0,0], thread: [24,0,0] Assertion `lhs.Start % 32 == 0` failed.
bug.cu:21: Value PaddedSum::operator()(const Value &, const Value &) const noexcept: block: [1,0,0], thread: [25,0,0] Assertion `lhs.Start % 32 == 0` failed.
bug.cu:21: Value PaddedSum::operator()(const Value &, const Value &) const noexcept: block: [1,0,0], thread: [26,0,0] Assertion `lhs.Start % 32 == 0` failed.
bug.cu:21: Value PaddedSum::operator()(const Value &, const Value &) const noexcept: block: [1,0,0], thread: [27,0,0] Assertion `lhs.Start % 32 == 0` failed.
bug.cu:21: Value PaddedSum::operator()(const Value &, const Value &) const noexcept: block: [1,0,0], thread: [28,0,0] Assertion `lhs.Start % 32 == 0` failed.
bug.cu:21: Value PaddedSum::operator()(const Value &, const Value &) const noexcept: block: [1,0,0], thread: [29,0,0] Assertion `lhs.Start % 32 == 0` failed.
bug.cu:21: Value PaddedSum::operator()(const Value &, const Value &) const noexcept: block: [1,0,0], thread: [30,0,0] Assertion `lhs.Start % 32 == 0` failed.
GPUassert: device-side assert triggered bug.cu 187

Expected behavior

The assert()s in PaddedSum::operator() shouldn't fire, since CUB shouldn't invoke the functor on uninitialized memory.

Reproduction link

No response

Operating System

Rocky 9.2

nvidia-smi output

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX 6000 Ada Gene...    Off | 00000000:06:00.0  On |                  Off |
| 30%   59C    P3              64W / 300W |   2471MiB / 49140MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

NVCC version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0
@pb-dseifert pb-dseifert added the bug Something isn't working right. label Sep 19, 2023
@jrhemstad
Copy link
Collaborator

Yeah, this has come up a few times before and it is definitely on our radar to address it.

See also:
#822
#789
#801

@jrhemstad
Copy link
Collaborator

Since this has popped up a few times, I started a tracking issue here: #459

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right.
Projects
Status: Todo
Development

No branches or pull requests

2 participants