[EPIC]: Improve usability of architecture specific features in libcudacxx #1083

jrhemstad · 2023-11-10T13:44:25Z

Is this a duplicate?

I confirmed there appear to be no duplicate issues for this request and that I agree to the Code of Conduct

Area

libcu++

Is your feature request related to a problem? Please describe.

As a CUDA developer using libcu++, I want to be able to use architecture dependent features of libcudacxx in my CUDA application. For any given libcudacxx header and feature, I need to be able to do the following:

#include <cuda/header>

__global__ void kernel(...){
  NV_DISPATCH_TARGET(
    NV_IS_EXACTLY_SM_60,   ( do_sm60_thing(); ),
    NV_PROVIDES_SM_70, ( do_sm70_thing(); ),
    NV_PROVIDES_SM_90, ( do_sm90_thing(); )
  )
}

I need to be able to compile this file with any set of architectures (-gencode arch=compute_XX,code=sm_XX) and for it to be able to compile and link successfully so long as I am always careful to use an architecture dependent feature in an appropriately guarded code path, whether using NV_IF_TARGET or __CUDA_ARCH__.

However, this does not work universally today. For example, the following fails to compile when compiled with -gencode arch=compute_52,code=sm_52 -gencode arch=compute_70,code=sm_70

#include <cuda/atomic>
#include <nv/target>

__global__ void kernel(){
  NV_IF_TARGET(
    NV_PROVIDES_SM_70,
      cuda::atomic<int> i;
  )
}

https://godbolt.org/z/ddMaW65Ej

This is because the cuda/atomic header will unconditionally error any time it is included in a TU that compiles for an architecture less than sm60, even if the feature is never used in code paths for the unsupported architecture.

A similar problem exists with cuda/barrier: https://godbolt.org/z/aEjsMT5YK

Describe the solution you'd like

I should be able to do the following with all libcu++ headers and features:

#include <cuda/header>

__global__ void kernel(...){
  NV_DISPATCH_TARGET(
    NV_IS_EXACTLY_SM_60,   ( do_sm60_thing(); ),
    NV_PROVIDES_SM_70, ( do_sm70_thing(); ),
    NV_PROVIDES_SM_90, ( do_sm90_thing(); )
  )
}

Tasks

Give feedback

Redesign libcudacxx architecture specific testing #1084

0 of 2
Enable using relevant parts of <cuda/atomic> on sm_52
Enable including <cuda/barrier> on sm_52
Options

Describe alternatives you've considered

If libcu++ doesn't do this, then I am forced to use lower level things like atomicAdd() or inline PTX.

Additional context

Related issues:
#997
#1082
#624

The text was updated successfully, but these errors were encountered:

jrhemstad · 2023-11-10T14:04:39Z

Note that the status quo has meant we can't even use <cuda/atomic> in CUB or Thrust (see #515 #516)

If we can't use it in our own libraries, how do we expect other people to use it?

jrhemstad added the feature request New feature or request. label Nov 10, 2023

jrhemstad mentioned this issue Nov 10, 2023

Redesign libcudacxx architecture specific testing #1084

Open

jrhemstad mentioned this issue Apr 1, 2024

[BUG]: <cuda/atomic> header should be included only in device compilation mode. NVIDIA/cuCollections#449

Closed

1 task

mfbalin mentioned this issue Apr 1, 2024

[GraphBolt] Add optimized unique_and_compact_batched. dmlc/dgl#7239

Merged

8 tasks

wmaxey linked a pull request May 13, 2024 that will close this issue

Enable including atomics and friends in TUs that do not support them. #1736

Open

2 tasks

mfbalin mentioned this issue Jul 12, 2024

[ENHANCEMENT]: Get rid of of custom atomic operations once CCCL 2.4 is ready NVIDIA/cuCollections#469

Closed

jrhemstad changed the title ~~[FEA]: Improve usability of architecture specific features in libcudacxx~~ [EPIC]: Improve usability of architecture specific features in libcudacxx Jul 22, 2024

jrhemstad assigned wmaxey Jul 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EPIC]: Improve usability of architecture specific features in libcudacxx #1083

[EPIC]: Improve usability of architecture specific features in libcudacxx #1083

jrhemstad commented Nov 10, 2023 •

edited

Loading

Tasks

jrhemstad commented Nov 10, 2023 •

edited

Loading

[EPIC]: Improve usability of architecture specific features in libcudacxx #1083

[EPIC]: Improve usability of architecture specific features in libcudacxx #1083

Comments

jrhemstad commented Nov 10, 2023 • edited Loading

Is this a duplicate?

Area

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Tasks

Describe alternatives you've considered

Additional context

jrhemstad commented Nov 10, 2023 • edited Loading

jrhemstad commented Nov 10, 2023 •

edited

Loading

jrhemstad commented Nov 10, 2023 •

edited

Loading