Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CUDA 12.2 #5711

Merged
merged 20 commits into from
Feb 12, 2024
Merged

Support CUDA 12.2 #5711

merged 20 commits into from
Feb 12, 2024

Conversation

jameslamb
Copy link
Member

@jameslamb jameslamb commented Jan 9, 2024

Description

Closes #5765

Notes for Reviewers

This is part of ongoing work to build and test wheels against CUDA 12.2.2 across all of RAPIDS. For more details see:

(created with rapids-reviser)

@jameslamb jameslamb requested a review from a team as a code owner January 9, 2024 15:41
@github-actions github-actions bot added the conda conda issue label Jan 11, 2024
@jameslamb jameslamb marked this pull request as draft January 11, 2024 19:21
@jameslamb jameslamb changed the title use CUDA 12.2 for building and testing wheels WIP: use CUDA 12.2 for building and testing wheels Jan 11, 2024
@jameslamb jameslamb changed the title WIP: use CUDA 12.2 for building and testing wheels WIP: add CUDA 12.2 support for conda packages and wheels Jan 11, 2024
@jameslamb jameslamb changed the title WIP: add CUDA 12.2 support for conda packages and wheels WIP: (DO NOT MERGE) add CUDA 12.2 support for conda packages and wheels Jan 11, 2024
@jameslamb jameslamb changed the title WIP: (DO NOT MERGE) add CUDA 12.2 support for conda packages and wheels (DO NOT MERGE) add CUDA 12.2 support for conda packages and wheels Jan 12, 2024
@jameslamb jameslamb marked this pull request as ready for review January 12, 2024 19:26
@jameslamb
Copy link
Member Author

The new CUDA 12.2 builds are failing with 2 issues.

conda packages

add_library cannot create imported target "CCCL::Thrust" because another
  target with the same name already exists.

(build link)

That should be fixed by rapidsai/rapids-cmake#522.

wheels

  FAILED: cuml-cpp/CMakeFiles/cuml++.dir/src/explainer/tree_shap.cu.o
  sccache /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DCUML_CPP_API -DCUML_ENABLE_GPU -DCUTLASS_ENABLE_CUDNN=1 -DCUTLASS_NAMESPACE=raft_cutlass -DDISABLE_CUSPARSE_DEPRECATED -DFMT_HEADER_ONLY=1 -DLIBCUDACXX_ENABLE_EXPERIMENTAL_MEMORY_RESOURCE -DRAFT_COMPILED -DRAFT_SYSTEM_LITTLE_ENDIAN=1 -DRMM_STATIC_CUDART -DSPDLOG_FMT_EXTERNAL -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_CPP -Dcuml___EXPORTS -I/__w/cuml/cuml/cpp/include -I/__w/cuml/cuml/cpp/src -I/__w/cuml/cuml/cpp/src/metrics -I/__w/cuml/cuml/cpp/src_prims -I/__w/cuml/cuml/python/build/cp39-cp39-manylinux_2_28_aarch64/_deps/raft-src/cpp/include -I/__w/cuml/cuml/python/build/cp39-cp39-manylinux_2_28_aarch64/_deps/treelite-src/include -I/__w/cuml/cuml/python/build/cp39-cp39-manylinux_2_28_aarch64/_deps/treelite-build/include -I/__w/cuml/cuml/cumlprims_mg/cpp/include -I/__w/cuml/cuml/python/build/cp39-cp39-manylinux_2_28_aarch64/_deps/rmm-src/include -I/usr/local/cuda/include -I/__w/cuml/cuml/python/build/cp39-cp39-manylinux_2_28_aarch64/_deps/fmt-src/include -I/__w/cuml/cuml/python/build/cp39-cp39-manylinux_2_28_aarch64/_deps/spdlog-src/include -I/__w/cuml/cuml/python/build/cp39-cp39-manylinux_2_28_aarch64/_deps/cuco-src/include -I/__w/cuml/cuml/python/build/cp39-cp39-manylinux_2_28_aarch64/_deps/nvidiacutlass-src/include -I/__w/cuml/cuml/python/build/cp39-cp39-manylinux_2_28_aarch64/_deps/nvidiacutlass-build/include -I/__w/cuml/cuml/python/build/cp39-cp39-manylinux_2_28_aarch64/_deps/gputreeshap-src -isystem /usr/local/cuda/targets/sbsa-linux/include -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_75,code=[sm_75]" "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_86,code=[sm_86]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" -Xcompiler=-fPIC --expt-extended-lambda --expt-relaxed-constexpr -Werror=all-warnings -Xcompiler=-Wall,-Werror,-Wno-error=deprecated-declarations,-Wno-error=sign-compare -Wno-deprecated-declarations -Xcompiler=-Wno-deprecated-declarations -Xfatbin=-compress-all -Xcompiler=-fopenmp -MD -MT cuml-cpp/CMakeFiles/cuml++.dir/src/explainer/tree_shap.cu.o -MF cuml-cpp/CMakeFiles/cuml++.dir/src/explainer/tree_shap.cu.o.d -x cu -c /__w/cuml/cuml/cpp/src/explainer/tree_shap.cu -o cuml-cpp/CMakeFiles/cuml++.dir/src/explainer/tree_shap.cu.o
  /__w/cuml/cuml/python/build/cp39-cp39-manylinux_2_28_aarch64/_deps/gputreeshap-src/GPUTreeShap/gpu_treeshap.h(464):
  error #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function
                PathElement<SplitConditionT> s_elements[kBlockSize];
                                             ^
            detected during:
              instantiation of "void gpu_treeshap::detail::ShapKernel<DatasetT,kBlockSize,kRowsPerWarp,SplitConditionT>(DatasetT, size_t, const gpu_treeshap::PathElement<SplitConditionT> *, const size_t *, size_t, double *) [with DatasetT=<unnamed>::DenseDatasetWrapper<float>, kBlockSize=256UL, kRowsPerWarp=1024UL, SplitConditionT=<unnamed>::SplitCondition<float>]" at line 508
              instantiation of "void gpu_treeshap::detail::ComputeShap(DatasetT, const thrust::device_vector<size_t, SizeTAllocatorT> &, const thrust::device_vector<gpu_treeshap::PathElement<SplitConditionT>, PathAllocatorT> &, size_t, double *) [with DatasetT=<unnamed>::DenseDatasetWrapper<float>, SizeTAllocatorT=thrust::device_allocator<size_t>, PathAllocatorT=thrust::device_allocator<gpu_treeshap::PathElement<<unnamed>::SplitCondition<float>>>, SplitConditionT=<unnamed>::SplitCondition<float>]" at line 1313
              instantiation of "void gpu_treeshap::GPUTreeShap(DatasetT, PathIteratorT, PathIteratorT, size_t, PhiIteratorT, PhiIteratorT) [with DeviceAllocatorT=thrust::device_allocator<int>, DatasetT=<unnamed>::DenseDatasetWrapper<float>, PathIteratorT=thrust::detail::normal_iterator<thrust::device_ptr<gpu_treeshap::PathElement<<unnamed>::SplitCondition<float>>>>, PhiIteratorT=thrust::device_ptr<float>]" at line 453 of /__w/cuml/cuml/cpp/src/explainer/tree_shap.cu
              instantiation of "void <unnamed>::gpu_treeshap_impl(ML::Explainer::TreePathInfo<ThresholdT> *, const DataT *, std::size_t, std::size_t, DataT *, std::size_t) [with ThresholdT=float, DataT=float]" at line 791 of /__w/cuml/cuml/cpp/src/explainer/tree_shap.cu

  Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

  1 error detected in the compilation of "/__w/cuml/cuml/cpp/src/explainer/tree_shap.cu".

(build link)

That looks similar to something raft faced with CUDA 12.2: rapidsai/raft#1870.

@jakirkham jakirkham added feature request New feature or request non-breaking Non-breaking change 5 - DO NOT MERGE Hold off on merging; see PR for details labels Jan 13, 2024
@jakirkham
Copy link
Member

Looks like this needs PR ( rapidsai/gputreeshap#42 )

@jakirkham
Copy link
Member

Looks like this needs PR ( rapidsai/gputreeshap#42 )

Testing this in PR ( #5719 )

@jameslamb jameslamb requested a review from a team as a code owner January 17, 2024 15:15
@github-actions github-actions bot added the CMake label Jan 17, 2024
@jameslamb jameslamb changed the base branch from branch-24.02 to branch-24.04 January 22, 2024 15:38
@jakirkham
Copy link
Member

jakirkham commented Jan 24, 2024

It looks like CUDA 12.2 passes! 🎉

There is a CUDA 11.8 failure with Dask though. After a cursory look, am not seeing any obvious cause

test_sparse_from_dense[8-float32-regularization0-True]

...

E           assert <array_equal: [ 0.64805114  0.40563476 -0.03294854 ... -0.20645222 -0.2884561 -0.13163252] [ 0.64738656  0.40191636 -0.03218034 ... -0.20163251 -0.28334481 -0.12986547] unit_tol=0.005 total_tol=0.0001 with_sign=True>
E            +  where <array_equal: [ 0.64805114  0.40563476 -0.03294854 ... -0.20645222 -0.2884561 -0.13163252] [ 0.64738656  0.40191636 -0.03218034 ... -0.20163251 -0.28334481 -0.12986547] unit_tol=0.005 total_tol=0.0001 with_sign=True> = array_equal(array([ 0.64805114,  0.40563476, -0.03294854,  0.05359353, -0.44779027,\n       -0.20645222, -0.2884561 , -0.13163252], dtype=float32), array([ 0.64738656,  0.40191636, -0.03218034,  0.05311682, -0.45539661,\n       -0.20163251, -0.28334481, -0.12986547]), 0.005, with_sign=True)

test_dask_logistic_regression.py:393: AssertionError

@jameslamb
Copy link
Member Author

there is an 11.8 failure with Dask though

Looks like @dantegd re-triggered that build and now the test passed: https://github.com/rapidsai/cuml/actions/runs/7642195211/job/20839681496?pr=5711

So maybe it is a flaky test? Either way, seems that it should be unrelated to CUDA 12.2 support since it's happening on the 11.8 job.

@jameslamb jameslamb changed the title (DO NOT MERGE) add CUDA 12.2 support for conda packages and wheels Support CUDA 12.2 Jan 25, 2024
@vyasr
Copy link
Contributor

vyasr commented Feb 9, 2024

Restarting CI now that the pandas 2 fixes are merged in #5758. We can check out the logs tomorrow.

@jakirkham jakirkham removed the 5 - DO NOT MERGE Hold off on merging; see PR for details label Feb 10, 2024
@bdice
Copy link
Contributor

bdice commented Feb 11, 2024

Ready for ops review - CI looks good. Triggering /merge.

@bdice
Copy link
Contributor

bdice commented Feb 11, 2024

/merge

@rapids-bot rapids-bot bot merged commit 1c570a3 into rapidsai:branch-24.04 Feb 12, 2024
55 checks passed
rapids-bot bot pushed a commit that referenced this pull request Feb 21, 2024
Follow-up to #5711

For all GitHub Actions configs, replaces uses of the `test-cuda-12.2` branch on `shared-workflows`
with `branch-24.04`, now that rapidsai/shared-workflows#166 has been merged.

### Notes for Reviewers

This is part of ongoing work to build and test packages against CUDA 12.2 across all of RAPIDS.

For more details see:

* rapidsai/build-planning#7

*(created with `rapids-reviser`)*

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Ray Douglass (https://github.com/raydouglass)

URL: #5776
@jameslamb jameslamb deleted the test-cuda-12.2 branch April 25, 2024 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conda conda issue feature request New feature or request non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CUDA 12 packages missing cuda-cudart runtime dependency?
6 participants