Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Unfavourable interaction between thrust::sort and thrust::sort_by_key #655

Closed
1 task done
mvieth opened this issue Nov 3, 2023 · 3 comments
Closed
1 task done
Assignees
Labels
bug Something isn't working right. needs triage Issues that require the team's attention

Comments

@mvieth
Copy link

mvieth commented Nov 3, 2023

Is this a duplicate?

Type of Bug

Silent Failure

Component

Not sure

Describe the bug

I noticed that thrust::sort_by_key fails/gives a wrong result in a very specific circumstance, namely:

  • thrust::sort_by_key is used in a shared library, let's say library A
  • thrust::sort is used in a second shared library, let's say library B
  • both are linked to the main program
  • thrust::sort_by_key is called with more than 4864 elements (threshold for selecting specific sorting algorithm)

With git bisect, I determined that this problem occurs since NVIDIA/cub@c4299c4 , meaning that all thrust/cub 2.x.y versions are affected, but 1.x.y versions are fine.
The problem occurs with GCC under Linux, but not with MSVC under Windows.
When I run it with compute-sanitizer, it shows Program hit cudaErrorMissingConfiguration (error 52) due to "__global__ function call is not configured" on CUDA API call to cudaGetLastError for histogram_kernel and exclusive_sum_kernel.
My best guess what happens: some symbols in library A and library B get confused during linking, possibly because some functions (like DeviceRadixSortExclusiveSumKernel) don't have ValueT in their template parameter list (which is cub::NullType for thrust::sort and something else for thrust::sort_by_key).
This might happen with other thrust functions (unconfirmed but possible, I think).
This bug was first noticed in PointCloudLibrary/pcl#5846

How to Reproduce

Here is a minimal reproducible example: thrust_test.zip

Expected behavior

thrust::sort_by_key always give the correct result (sorted)

Reproduction link

No response

Operating System

Linux (exact version or distro does not matter)

nvidia-smi output

Not relevant for the problem, as far as I can tell

NVCC version

Not relevant, but thrust/cub must be version 2.0.0 or newer (as described above)

@mvieth mvieth added the bug Something isn't working right. label Nov 3, 2023
@github-actions github-actions bot added the needs triage Issues that require the team's attention label Nov 3, 2023
Copy link
Contributor

github-actions bot commented Nov 3, 2023

Hi @mvieth!

Thanks for submitting this issue - the CCCL team has been notified and we'll get back to you as soon as we can!
In the mean time, feel free to add any relevant information to this issue.

@gevtushenko
Copy link
Collaborator

@mvieth thank you for reporting the issue! It should've been addressed by #443. Could you please try CCCL/main to see if you still can reproduce it?

@mvieth
Copy link
Author

mvieth commented Nov 4, 2023

@gevtushenko Yes, that solves it. Thanks!

@mvieth mvieth closed this as completed Nov 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right. needs triage Issues that require the team's attention
Projects
Archived in project
Development

No branches or pull requests

3 participants