[RFE]: cuda/std/tuple include in nvshmem_tensor.h breaks host compiler builds on CUDA 13

### Please provide the below details to ensure we understand your needs

### Symptom
`src/include/device_host/nvshmem_tensor.h`: line 37 unconditionally includes:
```
cpp
#include "cuda/std/tuple"
```

In CUDA 13, the CCCL headers were reorganized and this file moved to:
`/usr/local/cuda-13.x/targets/x86_64-linux/include/cccl/cuda/std/tuple`, and thus not available at `/usr/local/cuda/include/cuda/std/tuple`


nvcc handles this internally, but host compilers (g++/c++) do not get the cccl/ path automatically. This causes any project that compiles host code against NVSHMEM headers to fail with:
```
fatal error: cuda/std/tuple: No such file or directory
```

### Reproduction:
- CUDA 13.1, Ubuntu 24.04
- Build any C++ extension that includes nvshmem.h using a host compiler

My specific use case
- Built NVSHMEM against CUDA 13
```
export CUDA_HOME=/usr/local/cuda; export MPI_HOME=/opt/amazon/openmpi; export MPI_C_COMPILER=/opt/amazon/openmpi/bin/mpicc; export MPI_CXX_COMPILER=/opt/amazon/openmpi/bin/mpicxx; export LIBFABRIC_HOME=/opt/amazon/efa; export GDRCOPY_HOME=/usr/local/gdrdrv; export NVSHMEM_LIBFABRIC_SUPPORT=ON; export NVSHMEM_MPI_SUPPORT=OFF; export NVSHMEM_IBRC_SUPPORT=OFF; export NVSHMEM_IBGDA_SUPPORT=OFF; export NVSHMEM_IBDEVX_SUPPORT=OFF; export NVSHMEM_UCX_SUPPORT=OFF; export NVSHMEM_SHMEM_SUPPORT=OFF; export NVSHMEM_PMIX_SUPPORT=OFF; export NVSHMEM_USE_NCCL=OFF; export NVSHMEM_USE_GDRCOPY=ON; export NVSHMEM_USE_MLX5DV=OFF; export NVSHMEM_BUILD_TESTS=ON; export NVSHMEM_BUILD_EXAMPLES=OFF; export NVSHMEM_BUILD_PYTHON_LIB=OFF; export NVSHMEM_BUILD_BITCODE_LIBRARY=OFF; export CMAKE_CUDA_ARCHITECTURES=80;90;100;120; export NVSHMEM_DEBUG=False; make -j install
```
- Build [DeepEP](https://github.com/deepseek-ai/DeepEP) using NVSHMEM install path
```
CUDA_HOME=/usr/local/cuda NVSHMEM_DIR=<Path to the NVSHMEM build dir containing lib/ and include/> LIBFABRIC_HOME=/opt/amazon/efa PATH=/usr/local/cuda/bin:$PATH pip install --no-build-isolation .
```

### Potential Fix
cpp
```
#if CUDA_VERSION >= 13000
#include "cccl/cuda/std/tuple"
#include "cccl/cuda/std/type_traits"
#else
#include "cuda/std/tuple"
#include "cuda/std/type_traits"
#endif
```

Or add the cccl/ path to NVSHMEM's exported CMake include directories so downstream projects automatically get it.

Not sure if this is something that NVSHMEM would like to include. Thank you in advance for your input. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFE]: cuda/std/tuple include in nvshmem_tensor.h breaks host compiler builds on CUDA 13 #69

Please provide the below details to ensure we understand your needs

Symptom

Reproduction:

Potential Fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFE]: cuda/std/tuple include in nvshmem_tensor.h breaks host compiler builds on CUDA 13 #69

Description

Please provide the below details to ensure we understand your needs

Symptom

Reproduction:

Potential Fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions