Skip to content

Various fixes to cub::DeviceTransform #2709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 6, 2024

Conversation

bernhardmgruber
Copy link
Contributor

@bernhardmgruber bernhardmgruber commented Nov 5, 2024

This PR splits out some unrelated fixes from #2389

  • Workaround non-copyable iterators
  • Use a named constant for SMEM
  • Cast to raw reference 2
  • Fix passing non-copy-assignable iterators to transform_kernel via kernel_arg

The SASS of cub.test.device_transform.lid_0 did not change.

* Workaround non-copyable iterators
* Use a named constant for SMEM
* Cast to raw reference 2
* Fix passing non-copy-assignable iterators to transform_kernel via kernel_arg
@bernhardmgruber bernhardmgruber added the cub For all items related to CUB label Nov 5, 2024
@bernhardmgruber bernhardmgruber marked this pull request as ready for review November 5, 2024 16:27
@bernhardmgruber bernhardmgruber requested review from a team as code owners November 5, 2024 16:27
Copy link
Contributor

github-actions bot commented Nov 5, 2024

🟩 CI finished in 3h 22m: Pass: 100%/222 | Total: 2d 13h | Avg: 16m 35s | Max: 1h 33m | Hits: 89%/16113
  • 🟩 cub: Pass: 100%/110 | Total: 1d 21h | Avg: 24m 54s | Max: 1h 33m | Hits: 76%/2948

    🟩 cpu
      🟩 amd64              Pass: 100%/102 | Total:  1d 15h | Avg: 23m 26s | Max:  1h 33m | Hits:  76%/2948  
      🟩 arm64              Pass: 100%/8   | Total:  5h 49m | Avg: 43m 42s | Max: 49m 42s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  1h 50m | Avg:  7m 20s | Max: 43m 53s | Hits:  77%/737   
      🟩 11.8               Pass: 100%/3   | Total:  2h 18m | Avg: 46m 11s | Max: 47m 33s
      🟩 12.5               Pass: 100%/4   | Total:  2h 32m | Avg: 38m 03s | Max: 38m 46s
      🟩 12.6               Pass: 100%/88  | Total:  1d 14h | Avg: 26m 35s | Max:  1h 33m | Hits:  76%/2211  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/4   | Total: 17m 13s | Avg:  4m 18s | Max:  4m 26s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  1h 50m | Avg:  7m 20s | Max: 43m 53s | Hits:  77%/737   
      🟩 nvcc11.8           Pass: 100%/3   | Total:  2h 18m | Avg: 46m 11s | Max: 47m 33s
      🟩 nvcc12.5           Pass: 100%/4   | Total:  2h 32m | Avg: 38m 03s | Max: 38m 46s
      🟩 nvcc12.6           Pass: 100%/84  | Total:  1d 14h | Avg: 27m 38s | Max:  1h 33m | Hits:  76%/2211  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/4   | Total: 17m 13s | Avg:  4m 18s | Max:  4m 26s
      🟩 nvcc               Pass: 100%/106 | Total:  1d 21h | Avg: 25m 41s | Max:  1h 33m | Hits:  76%/2948  
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  1h 57m | Avg: 19m 37s | Max: 35m 24s
      🟩 Clang10            Pass: 100%/3   | Total:  1h 14m | Avg: 24m 52s | Max: 34m 36s
      🟩 Clang11            Pass: 100%/4   | Total:  1h 45m | Avg: 26m 29s | Max: 33m 44s
      🟩 Clang12            Pass: 100%/4   | Total:  2h 22m | Avg: 35m 44s | Max: 37m 20s
      🟩 Clang13            Pass: 100%/4   | Total:  2h 24m | Avg: 36m 10s | Max: 38m 06s
      🟩 Clang14            Pass: 100%/4   | Total:  2h 24m | Avg: 36m 07s | Max: 38m 04s
      🟩 Clang15            Pass: 100%/4   | Total:  2h 18m | Avg: 34m 44s | Max: 37m 50s
      🟩 Clang16            Pass: 100%/4   | Total:  1h 18m | Avg: 19m 44s | Max: 34m 52s
      🟩 Clang17            Pass: 100%/4   | Total:  2h 20m | Avg: 35m 04s | Max: 36m 57s
      🟩 Clang18            Pass: 100%/11  | Total:  4h 45m | Avg: 25m 59s | Max: 45m 16s
      🟩 GCC6               Pass: 100%/2   | Total:  9m 19s | Avg:  4m 39s | Max:  4m 44s
      🟩 GCC7               Pass: 100%/6   | Total:  1h 30m | Avg: 15m 07s | Max: 36m 56s
      🟩 GCC8               Pass: 100%/6   | Total: 29m 58s | Avg:  4m 59s | Max:  5m 39s
      🟩 GCC9               Pass: 100%/6   | Total:  1h 03m | Avg: 10m 30s | Max: 36m 59s
      🟩 GCC10              Pass: 100%/4   | Total: 21m 52s | Avg:  5m 28s | Max:  5m 45s
      🟩 GCC11              Pass: 100%/7   | Total:  3h 10m | Avg: 27m 12s | Max: 47m 33s
      🟩 GCC12              Pass: 100%/4   | Total:  1h 22m | Avg: 20m 37s | Max: 36m 23s
      🟩 GCC13              Pass: 100%/16  | Total:  8h 03m | Avg: 30m 13s | Max:  1h 33m
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 50m 38s | Avg: 16m 52s | Max: 37m 24s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 43m 53s | Avg: 43m 53s | Max: 43m 53s | Hits:  77%/737   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 36m | Avg: 48m 07s | Max: 48m 25s | Hits:  76%/1474  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 51m 29s | Avg: 51m 29s | Max: 51m 29s | Hits:  76%/737   
      🟩 NVHPC24.7          Pass: 100%/4   | Total:  2h 32m | Avg: 38m 03s | Max: 38m 46s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/48  | Total: 22h 54m | Avg: 28m 38s | Max: 45m 16s
      🟩 GCC                Pass: 100%/51  | Total: 16h 11m | Avg: 19m 02s | Max:  1h 33m
      🟩 Intel              Pass: 100%/3   | Total: 50m 38s | Avg: 16m 52s | Max: 37m 24s
      🟩 MSVC               Pass: 100%/4   | Total:  3h 11m | Avg: 47m 54s | Max: 51m 29s | Hits:  76%/2948  
      🟩 NVHPC              Pass: 100%/4   | Total:  2h 32m | Avg: 38m 03s | Max: 38m 46s
    🟩 gpu
      🟩 v100               Pass: 100%/110 | Total:  1d 21h | Avg: 24m 54s | Max:  1h 33m | Hits:  76%/2948  
    🟩 jobs
      🟩 Build              Pass: 100%/102 | Total:  1d 16h | Avg: 24m 05s | Max: 51m 29s | Hits:  76%/2948  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 19m 24s | Avg: 19m 24s | Max: 19m 24s
      🟩 GraphCapture       Pass: 100%/1   | Total:  1h 33m | Avg:  1h 33m | Max:  1h 33m
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 03m | Avg: 21m 00s | Max: 23m 13s
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 47m | Avg: 35m 49s | Max: 45m 16s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  2h 18m | Avg: 46m 11s | Max: 47m 33s
      🟩 90a                Pass: 100%/4   | Total: 18m 22s | Avg:  4m 35s | Max:  4m 42s
    🟩 std
      🟩 11                 Pass: 100%/30  | Total: 10h 25m | Avg: 20m 50s | Max: 49m 42s
      🟩 14                 Pass: 100%/29  | Total: 11h 17m | Avg: 23m 22s | Max: 48m 25s | Hits:  77%/1474  
      🟩 17                 Pass: 100%/27  | Total: 11h 07m | Avg: 24m 42s | Max: 47m 49s | Hits:  76%/737   
      🟩 20                 Pass: 100%/24  | Total: 12h 50m | Avg: 32m 05s | Max:  1h 33m | Hits:  76%/737   
    
  • 🟩 thrust: Pass: 100%/109 | Total: 15h 11m | Avg: 8m 21s | Max: 45m 29s | Hits: 92%/13165

    🟩 cpu
      🟩 amd64              Pass: 100%/101 | Total: 12h 54m | Avg:  7m 39s | Max: 45m 29s | Hits:  92%/13165 
      🟩 arm64              Pass: 100%/8   | Total:  2h 17m | Avg: 17m 12s | Max: 20m 55s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  1h 43m | Avg:  6m 55s | Max: 40m 38s | Hits:  94%/2633  
      🟩 11.8               Pass: 100%/3   | Total: 15m 44s | Avg:  5m 14s | Max:  6m 13s
      🟩 12.5               Pass: 100%/4   | Total:  1h 04m | Avg: 16m 08s | Max: 16m 53s
      🟩 12.6               Pass: 100%/87  | Total: 12h 07m | Avg:  8m 21s | Max: 45m 29s | Hits:  92%/10532 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/4   | Total: 19m 50s | Avg:  4m 57s | Max:  5m 06s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  1h 43m | Avg:  6m 55s | Max: 40m 38s | Hits:  94%/2633  
      🟩 nvcc11.8           Pass: 100%/3   | Total: 15m 44s | Avg:  5m 14s | Max:  6m 13s
      🟩 nvcc12.5           Pass: 100%/4   | Total:  1h 04m | Avg: 16m 08s | Max: 16m 53s
      🟩 nvcc12.6           Pass: 100%/83  | Total: 11h 47m | Avg:  8m 31s | Max: 45m 29s | Hits:  92%/10532 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/4   | Total: 19m 50s | Avg:  4m 57s | Max:  5m 06s
      🟩 nvcc               Pass: 100%/105 | Total: 14h 51m | Avg:  8m 29s | Max: 45m 29s | Hits:  92%/13165 
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total: 33m 52s | Avg:  5m 38s | Max:  6m 46s
      🟩 Clang10            Pass: 100%/3   | Total: 19m 42s | Avg:  6m 34s | Max:  7m 04s
      🟩 Clang11            Pass: 100%/4   | Total: 22m 27s | Avg:  5m 36s | Max:  5m 51s
      🟩 Clang12            Pass: 100%/4   | Total: 20m 48s | Avg:  5m 12s | Max:  5m 35s
      🟩 Clang13            Pass: 100%/4   | Total: 21m 53s | Avg:  5m 28s | Max:  5m 43s
      🟩 Clang14            Pass: 100%/4   | Total: 21m 38s | Avg:  5m 24s | Max:  5m 52s
      🟩 Clang15            Pass: 100%/4   | Total: 21m 19s | Avg:  5m 19s | Max:  5m 32s
      🟩 Clang16            Pass: 100%/4   | Total: 22m 05s | Avg:  5m 31s | Max:  5m 51s
      🟩 Clang17            Pass: 100%/4   | Total: 21m 07s | Avg:  5m 16s | Max:  5m 34s
      🟩 Clang18            Pass: 100%/11  | Total:  1h 51m | Avg: 10m 05s | Max: 19m 04s
      🟩 GCC6               Pass: 100%/2   | Total:  8m 20s | Avg:  4m 10s | Max:  4m 14s
      🟩 GCC7               Pass: 100%/6   | Total: 31m 10s | Avg:  5m 11s | Max:  7m 02s
      🟩 GCC8               Pass: 100%/6   | Total: 29m 23s | Avg:  4m 53s | Max:  5m 34s
      🟩 GCC9               Pass: 100%/6   | Total: 30m 06s | Avg:  5m 01s | Max:  6m 00s
      🟩 GCC10              Pass: 100%/4   | Total: 21m 58s | Avg:  5m 29s | Max:  6m 01s
      🟩 GCC11              Pass: 100%/7   | Total: 37m 11s | Avg:  5m 18s | Max:  6m 13s
      🟩 GCC12              Pass: 100%/4   | Total: 23m 23s | Avg:  5m 50s | Max:  6m 24s
      🟩 GCC13              Pass: 100%/14  | Total:  2h 23m | Avg: 10m 15s | Max: 20m 55s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 20m 21s | Avg:  6m 47s | Max:  7m 13s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 40m 38s | Avg: 40m 38s | Max: 40m 38s | Hits:  94%/2633  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 19m | Avg: 39m 38s | Max: 40m 24s | Hits:  89%/5266  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 06m | Avg: 33m 00s | Max: 45m 29s | Hits:  95%/5266  
      🟩 NVHPC24.7          Pass: 100%/4   | Total:  1h 04m | Avg: 16m 08s | Max: 16m 53s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/48  | Total:  5h 15m | Avg:  6m 34s | Max: 19m 04s
      🟩 GCC                Pass: 100%/49  | Total:  5h 25m | Avg:  6m 38s | Max: 20m 55s
      🟩 Intel              Pass: 100%/3   | Total: 20m 21s | Avg:  6m 47s | Max:  7m 13s
      🟩 MSVC               Pass: 100%/5   | Total:  3h 05m | Avg: 37m 11s | Max: 45m 29s | Hits:  92%/13165 
      🟩 NVHPC              Pass: 100%/4   | Total:  1h 04m | Avg: 16m 08s | Max: 16m 53s
    🟩 gpu
      🟩 v100               Pass: 100%/109 | Total: 15h 11m | Avg:  8m 21s | Max: 45m 29s | Hits:  92%/13165 
    🟩 jobs
      🟩 Build              Pass: 100%/102 | Total: 13h 50m | Avg:  8m 08s | Max: 45m 29s | Hits:  91%/10532 
      🟩 TestCPU            Pass: 100%/4   | Total: 42m 29s | Avg: 10m 37s | Max: 20m 31s | Hits:  99%/2633  
      🟩 TestGPU            Pass: 100%/3   | Total: 38m 31s | Avg: 12m 50s | Max: 14m 13s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 15m 44s | Avg:  5m 14s | Max:  6m 13s
      🟩 90a                Pass: 100%/4   | Total: 18m 59s | Avg:  4m 44s | Max:  4m 59s
    🟩 std
      🟩 11                 Pass: 100%/30  | Total:  3h 05m | Avg:  6m 10s | Max: 14m 31s
      🟩 14                 Pass: 100%/29  | Total:  4h 17m | Avg:  8m 52s | Max: 40m 38s | Hits:  91%/5266  
      🟩 17                 Pass: 100%/27  | Total:  3h 45m | Avg:  8m 21s | Max: 40m 24s | Hits:  90%/2633  
      🟩 20                 Pass: 100%/23  | Total:  4h 03m | Avg: 10m 35s | Max: 45m 29s | Hits:  95%/5266  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 11m 57s | Avg: 5m 58s | Max: 9m 45s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 11m 57s | Avg:  5m 58s | Max:  9m 45s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total: 11m 57s | Avg:  5m 58s | Max:  9m 45s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total: 11m 57s | Avg:  5m 58s | Max:  9m 45s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 11m 57s | Avg:  5m 58s | Max:  9m 45s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 11m 57s | Avg:  5m 58s | Max:  9m 45s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 11m 57s | Avg:  5m 58s | Max:  9m 45s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total: 11m 57s | Avg:  5m 58s | Max:  9m 45s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 12s | Avg:  2m 12s | Max:  2m 12s
      🟩 Test               Pass: 100%/1   | Total:  9m 45s | Avg:  9m 45s | Max:  9m 45s
    
  • 🟩 python: Pass: 100%/1 | Total: 19m 59s | Avg: 19m 59s | Max: 19m 59s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 19m 59s | Avg: 19m 59s | Max: 19m 59s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 19m 59s | Avg: 19m 59s | Max: 19m 59s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 19m 59s | Avg: 19m 59s | Max: 19m 59s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 19m 59s | Avg: 19m 59s | Max: 19m 59s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 19m 59s | Avg: 19m 59s | Max: 19m 59s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 19m 59s | Avg: 19m 59s | Max: 19m 59s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 19m 59s | Avg: 19m 59s | Max: 19m 59s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 19m 59s | Avg: 19m 59s | Max: 19m 59s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 222)

# Runner
184 linux-amd64-cpu16
16 linux-arm64-cpu16
13 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16

@bernhardmgruber bernhardmgruber merged commit c358bde into NVIDIA:main Nov 6, 2024
240 checks passed
@bernhardmgruber bernhardmgruber deleted the transform_fixes branch November 6, 2024 08:21
pciolkosz pushed a commit to pciolkosz/cccl that referenced this pull request Nov 6, 2024
* Workaround non-copyable iterators
* Use a named constant for SMEM
* Cast to raw reference 2
* Fix passing non-copy-assignable iterators to transform_kernel via kernel_arg
fbusato pushed a commit to fbusato/cccl that referenced this pull request Nov 9, 2024
* Workaround non-copyable iterators
* Use a named constant for SMEM
* Cast to raw reference 2
* Fix passing non-copy-assignable iterators to transform_kernel via kernel_arg
pciolkosz added a commit that referenced this pull request Nov 11, 2024
* copy pasted sample

* First draft

* Kernel functor and some other things

* Clean up and break up long main function

* Needs launch fix

* Switch to copy_bytes and cleanups

* Missing include

* Add exception print and waive value

* Adjust copy count

* Add license and switch benchmark streams

* Remove a function left as a mistake

* Update copyright date

Co-authored-by: Eric Niebler <[email protected]>

* Setup cudax examples. (#2697)

* Move the sample to new location and fix warning

* build fixes and 0 return code on waive

* Some new MSVC errors

* explicit cast

* Rename enable/disable peer access and separate the sample loop

* Add `cuda::minimum` and `cuda::maximum` (#2681)

* Add cuda::minimum and cuda::maximum

* Various fixes to cub::DeviceTransform (#2709)

* Workaround non-copyable iterators
* Use a named constant for SMEM
* Cast to raw reference 2
* Fix passing non-copy-assignable iterators to transform_kernel via kernel_arg

* Make `thrust::transform` use `cub::DeviceTransform` (#2389)

* Add transform benchmark requiring a stable address
* Make thrust::transform use cub::DeviceTransform
* Introduces address stability detection and opt-in in libcu++
* Mark lambdas in Thrust BabelStream benchmark address oblivious
* Optimize prefetch cub::DeviceTransform for small problems

Fixes: #2263

* Ensure that we only use the inline variable trait when it is actually available (#2712)

* Ensure that we only use the inline variable trait when it is actually available

* Use the right define for internal traits

* [CUDAX] Rename memory resource and memory pool from async to device (#2710)

* Rename the type

* Update tests

* Rename async memory pool

* Rename the tests

* Change name in the docs

* Generalise the memory_pool_properties name

* Fix docs

---------

Co-authored-by: Michael Schellenberger Costa <[email protected]>

* Update memory resource name

---------

Co-authored-by: Eric Niebler <[email protected]>
Co-authored-by: Allison Piper <[email protected]>
Co-authored-by: Jacob Faibussowitsch <[email protected]>
Co-authored-by: Bernhard Manfred Gruber <[email protected]>
Co-authored-by: Michael Schellenberger Costa <[email protected]>
fbusato pushed a commit to fbusato/cccl that referenced this pull request Nov 12, 2024
* copy pasted sample

* First draft

* Kernel functor and some other things

* Clean up and break up long main function

* Needs launch fix

* Switch to copy_bytes and cleanups

* Missing include

* Add exception print and waive value

* Adjust copy count

* Add license and switch benchmark streams

* Remove a function left as a mistake

* Update copyright date

Co-authored-by: Eric Niebler <[email protected]>

* Setup cudax examples. (NVIDIA#2697)

* Move the sample to new location and fix warning

* build fixes and 0 return code on waive

* Some new MSVC errors

* explicit cast

* Rename enable/disable peer access and separate the sample loop

* Add `cuda::minimum` and `cuda::maximum` (NVIDIA#2681)

* Add cuda::minimum and cuda::maximum

* Various fixes to cub::DeviceTransform (NVIDIA#2709)

* Workaround non-copyable iterators
* Use a named constant for SMEM
* Cast to raw reference 2
* Fix passing non-copy-assignable iterators to transform_kernel via kernel_arg

* Make `thrust::transform` use `cub::DeviceTransform` (NVIDIA#2389)

* Add transform benchmark requiring a stable address
* Make thrust::transform use cub::DeviceTransform
* Introduces address stability detection and opt-in in libcu++
* Mark lambdas in Thrust BabelStream benchmark address oblivious
* Optimize prefetch cub::DeviceTransform for small problems

Fixes: NVIDIA#2263

* Ensure that we only use the inline variable trait when it is actually available (NVIDIA#2712)

* Ensure that we only use the inline variable trait when it is actually available

* Use the right define for internal traits

* [CUDAX] Rename memory resource and memory pool from async to device (NVIDIA#2710)

* Rename the type

* Update tests

* Rename async memory pool

* Rename the tests

* Change name in the docs

* Generalise the memory_pool_properties name

* Fix docs

---------

Co-authored-by: Michael Schellenberger Costa <[email protected]>

* Update memory resource name

---------

Co-authored-by: Eric Niebler <[email protected]>
Co-authored-by: Allison Piper <[email protected]>
Co-authored-by: Jacob Faibussowitsch <[email protected]>
Co-authored-by: Bernhard Manfred Gruber <[email protected]>
Co-authored-by: Michael Schellenberger Costa <[email protected]>
fbusato pushed a commit to fbusato/cccl that referenced this pull request Jan 9, 2025
* Workaround non-copyable iterators
* Use a named constant for SMEM
* Cast to raw reference 2
* Fix passing non-copy-assignable iterators to transform_kernel via kernel_arg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cub For all items related to CUB
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants