Skip to content

[RFE]: cuda/std/tuple include in nvshmem_tensor.h breaks host compiler builds on CUDA 13 #69

@yunyub-1664303

Description

@yunyub-1664303

Please provide the below details to ensure we understand your needs

Symptom

src/include/device_host/nvshmem_tensor.h: line 37 unconditionally includes:

cpp
#include "cuda/std/tuple"

In CUDA 13, the CCCL headers were reorganized and this file moved to:
/usr/local/cuda-13.x/targets/x86_64-linux/include/cccl/cuda/std/tuple, and thus not available at /usr/local/cuda/include/cuda/std/tuple

nvcc handles this internally, but host compilers (g++/c++) do not get the cccl/ path automatically. This causes any project that compiles host code against NVSHMEM headers to fail with:

fatal error: cuda/std/tuple: No such file or directory

Reproduction:

  • CUDA 13.1, Ubuntu 24.04
  • Build any C++ extension that includes nvshmem.h using a host compiler

My specific use case

  • Built NVSHMEM against CUDA 13
export CUDA_HOME=/usr/local/cuda; export MPI_HOME=/opt/amazon/openmpi; export MPI_C_COMPILER=/opt/amazon/openmpi/bin/mpicc; export MPI_CXX_COMPILER=/opt/amazon/openmpi/bin/mpicxx; export LIBFABRIC_HOME=/opt/amazon/efa; export GDRCOPY_HOME=/usr/local/gdrdrv; export NVSHMEM_LIBFABRIC_SUPPORT=ON; export NVSHMEM_MPI_SUPPORT=OFF; export NVSHMEM_IBRC_SUPPORT=OFF; export NVSHMEM_IBGDA_SUPPORT=OFF; export NVSHMEM_IBDEVX_SUPPORT=OFF; export NVSHMEM_UCX_SUPPORT=OFF; export NVSHMEM_SHMEM_SUPPORT=OFF; export NVSHMEM_PMIX_SUPPORT=OFF; export NVSHMEM_USE_NCCL=OFF; export NVSHMEM_USE_GDRCOPY=ON; export NVSHMEM_USE_MLX5DV=OFF; export NVSHMEM_BUILD_TESTS=ON; export NVSHMEM_BUILD_EXAMPLES=OFF; export NVSHMEM_BUILD_PYTHON_LIB=OFF; export NVSHMEM_BUILD_BITCODE_LIBRARY=OFF; export CMAKE_CUDA_ARCHITECTURES=80;90;100;120; export NVSHMEM_DEBUG=False; make -j install
  • Build DeepEP using NVSHMEM install path
CUDA_HOME=/usr/local/cuda NVSHMEM_DIR=<Path to the NVSHMEM build dir containing lib/ and include/> LIBFABRIC_HOME=/opt/amazon/efa PATH=/usr/local/cuda/bin:$PATH pip install --no-build-isolation .

Potential Fix

cpp

#if CUDA_VERSION >= 13000
#include "cccl/cuda/std/tuple"
#include "cccl/cuda/std/type_traits"
#else
#include "cuda/std/tuple"
#include "cuda/std/type_traits"
#endif

Or add the cccl/ path to NVSHMEM's exported CMake include directories so downstream projects automatically get it.

Not sure if this is something that NVSHMEM would like to include. Thank you in advance for your input.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions