-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: CUB device_transform breaks nvc++ -stdpar #2402
Comments
Hi! I am sorry this causes a breakage for nvc++. I didn't know that Since I am leaving for parental leave very soon, the only quick solution I see is
and then figure out how we can proceed later. |
Discussed with @jrhemstad, who is going to follow-up on this for the short term. |
I discussed this briefly with @jrhemstad yesterday and we would like to fix cooperative groups in the long run (option 1). However, this may still take a while. In the meantime, once #2396 is merged, we can disable the ublkcp kernel that uses cooperative groups when compiling with nvc++ (option 3). The prefetch implementation should work with nvc++ and also deliver solid runtime improvements. |
I could reproduce and workaround the issue by disabling CG and the ublkcp kernel:
That's the extent to which I could test CUB with nvc++. |
Is this a duplicate?
Type of Bug
Compile-time Error
Component
CUB
Describe the bug
PR #2086 breaks stdexec example nvexec.launch when compiled with NVC++. Compilation fails with unhelpful errors such as
error: namespace "cooperative_groups" has no member "thread_block_tile"
. @ericnieblerPR #2086 added two new files to the CUB headers. One of them,
cub/device/dispatch/dispatch_transform.cuh
, which is indirectly included fromcub/cub.cuh
, contains#include <cooperative_groups.h>
. The header<cooperative_groups.h>
is entirely wrapped by an#if defined(__cplusplus) && defined(__CUDACC__)
block. When compiling withnvc++ -stdpar=gpu
, the macro__CUDACC__
is not defined, so<cooperative_groups.h>
is a no-op. Subsequent attempts to use stuff from thecooperative_groups
namespace fail with undefined identifiers.This doesn't break NVC++'s stdpar parallel algorithms yet because nothing in the parallel algorithm implementation includes
cub/cub.cuh
orcub/device/device_transform.cuh
. But that will change ifthrust::transform
is changed to use the new CUB transform algorithms. I would like to get this fixed before that happens, when the impact of this bug is still small.I don't know the correct way to fix this. Some possibilities are:
<cooperative_groups.h>
to work withnvc++ -stdpar
. (CUB would still need to deal with the issue as long as a CUDA Toolkit without the cooperative groups change is still supported.)cub/cub.cuh
to not include<cub/device/device_transform.cuh>
. Any code that wants to use the new CUB transform algorithms needs to include<cub/device/device_transform.cuh>
explicitly. (This then pushes the problem to Thrust, which would need to adopt option 2 or 3.)All the options have tradeoffs, and I don't know how best to balance those tradeoffs.
How to Reproduce
Though first noticed by stdexec example nvexec.launch, which includes
<cub/cub.cuh>
, it can be reproduced with a much smaller test, with NVC++ that uses the latest main branch of CCCL.Expected behavior
It should be possible to use CUB with
nvc++ -stdpar
without errors.Reproduction link
No response
Operating System
No response
nvidia-smi output
No response
NVCC version
No response
The text was updated successfully, but these errors were encountered: