Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Investigate inclusive_scan issue in column-row conversion code #1579

Closed
ttnghia opened this issue Nov 21, 2023 · 3 comments
Closed

[BUG] Investigate inclusive_scan issue in column-row conversion code #1579

ttnghia opened this issue Nov 21, 2023 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@ttnghia
Copy link
Collaborator

ttnghia commented Nov 21, 2023

In #1567, we have the issue with invalid memory access:

C++ exception with description "inclusive_scan failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered" thrown in the test body.

#1577 is a temporary fix, but that does not fix the root cause of the problem and we should spend more time investigating what is really wrong here.

@sameerz
Copy link
Collaborator

sameerz commented Nov 21, 2023

We are not calling the code referenced in #1577 so I am moving this to 24.02 for investigation.

@ttnghia
Copy link
Collaborator Author

ttnghia commented Nov 30, 2023

This seems to be a compiler bug and I can't figure out what is the consistent condition to trigger it. The only thing I can tell is that, the issue may show up if we have thrust::in(ex)clusive_scan operating on non-pointer input iterators. The issue may disappear if I modify some unrelated code, or compile the source code and link it directly with the application instead of compiling into the shared library etc.

In summary, we may have to live with it for now, waiting for more future analysis or waiting it to be fixed by the new compiler version.

@ttnghia
Copy link
Collaborator Author

ttnghia commented Jan 23, 2024

After investigation, we found that the issue is due to kernel visibility problem. It has been fixed by rapidsai/rapids-cmake#523 and NVIDIA/cuCollections#422.

Close this as resolved.

@ttnghia ttnghia closed this as completed Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants