Redesign libcudacxx architecture specific testing #1084

jrhemstad · 2023-11-10T13:44:32Z

Summary

As described in #1083, as a user of libcudacxx headers and features, I want to be able to use libcudacxx architecture specific features in a TU compiled for multiple architectures (-gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70) so long as I am careful to guard the code paths in which I use those features, e.g.,

#include <cuda/header>

__global__ void kernel(...){
  NV_DISPATCH_TARGET(
    NV_IS_EXACTLY_SM_60,   ( do_sm60_thing(); ),
    NV_PROVIDES_SM_70, ( do_sm70_thing(); ),
    NV_PROVIDES_SM_90, ( do_sm90_thing(); )
  )
}

This is not the case today, and this is indirectly due in part to how libcudacxx tests architecture specific features.

libcudacxx uses lit for compiling and running its tests, and it uses the UNSUPPORTED: keyword to indicate environments that the test doesn't support, including GPU architectures. For example:

https://github.com/NVIDIA/libcudacxx/blob/206d8f9179deda6006795865c9c61cbd24b5e6cc/.upstream-tests/test/cuda/memcpy_async_16.pass.cpp#L11-L20

The // UNSUPPORTED: pre-sm-70 means that this test file will be skipped entirely if compiling for <sm70, e.g., if compiling with -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70.

This means libcudacxx arch conditional tests are never run when compiling for multiple architectures that include one that is unsupported. This has already led to bugs being missed #664.

Solution

libcudacxx test infrastructure has been designed assuming that each test file would only be compiled for a single architecture. However, this is not reflective of how users actually compile their code.

The test infrastructure needs to be redesigned to ensure arch specific tests are still run even when compiling for multiple architectures.

How exactly we do this still needs to be determined, but effectively, what we want is instead of this:

// UNSUPPORTED: pre-sm-70
#include "memcpy_async.h"
int main(int argc, char ** argv)
{
    test_select_source<uint16_t>();
    return 0;
}

we want this:

#include "memcpy_async.h"
int main(int argc, char ** argv)
{
    NV_IF_TARGET( 
       NV_PROVIDES_SM_70,    test_select_source<uint16_t>();
    )
    return 0;
}

It would be nice to avoid needing to actually rewrite tests that are currently using // UNSUPPORTED: pre-sm-70 and could somehow implicitly inject the appropriate NV_IF_TARGET logic based on the information in the UNSUPPORTED: key. Perhaps by injecting a different fake_main that bakes the appropriate NV_IF_TARGET logic in.

Tasks

Give feedback

[THEME] Initial design for libcudacxx architecture specific testing #1188
Refactor existing architecture specific tests using new design
Options

The text was updated successfully, but these errors were encountered:

jrhemstad · 2024-02-22T18:49:21Z

Relevant discussion here: #1375

This was referenced Nov 10, 2023

[EPIC]: Improve usability of architecture specific features in libcudacxx #1083

Open

[FEA]: Add SM90 to architecture list for CI #1092

Open

jrhemstad mentioned this issue Feb 22, 2024

Add additional build job for sm90 #1428

Merged

jrhemstad mentioned this issue May 14, 2024

Enable including atomics and friends in TUs that do not support them. #1736

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redesign libcudacxx architecture specific testing #1084

Redesign libcudacxx architecture specific testing #1084

jrhemstad commented Nov 10, 2023 •

edited

Loading

Tasks

jrhemstad commented Feb 22, 2024

Redesign libcudacxx architecture specific testing #1084

Redesign libcudacxx architecture specific testing #1084

Comments

jrhemstad commented Nov 10, 2023 • edited Loading

Summary

Solution

Tasks

jrhemstad commented Feb 22, 2024

jrhemstad commented Nov 10, 2023 •

edited

Loading