Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop cub::DeviceTransform fallback to cub::DeviceFor #2660

Merged
merged 1 commit into from
Oct 30, 2024

Conversation

bernhardmgruber
Copy link
Contributor

@bernhardmgruber bernhardmgruber commented Oct 30, 2024

We previously had a fallback algorithm that would use cub::DeviceFor. Benchmarks (see #2396) showed that the prefetch algorithm is always superior to that fallback, so let's remove it.

Depends on:

@bernhardmgruber bernhardmgruber requested review from a team as code owners October 30, 2024 11:14
@bernhardmgruber bernhardmgruber added the cub For all items related to CUB label Oct 30, 2024
We previously had a fallback algorithm that would use cub::DeviceFor. Benchmarks showed that the prefetch algorithm is always superior to that fallback, so let's remove it.
Copy link
Contributor

🟩 CI finished in 44m 40s: Pass: 100%/222 | Total: 1d 00h | Avg: 6m 40s | Max: 30m 31s | Hits: 99%/16089
  • 🟩 cub: Pass: 100%/110 | Total: 12h 02m | Avg: 6m 33s | Max: 30m 31s | Hits: 99%/2924

    🟩 cpu
      🟩 amd64              Pass: 100%/102 | Total: 11h 25m | Avg:  6m 43s | Max: 30m 31s | Hits:  99%/2924  
      🟩 arm64              Pass: 100%/8   | Total: 36m 44s | Avg:  4m 35s | Max:  5m 10s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  1h 11m | Avg:  4m 44s | Max: 13m 15s | Hits:  99%/731   
      🟩 11.8               Pass: 100%/3   | Total: 14m 47s | Avg:  4m 55s | Max:  5m 15s
      🟩 12.5               Pass: 100%/4   | Total: 31m 29s | Avg:  7m 52s | Max:  8m 17s
      🟩 12.6               Pass: 100%/88  | Total: 10h 04m | Avg:  6m 52s | Max: 30m 31s | Hits:  99%/2193  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/4   | Total: 15m 21s | Avg:  3m 50s | Max:  4m 03s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  1h 11m | Avg:  4m 44s | Max: 13m 15s | Hits:  99%/731   
      🟩 nvcc11.8           Pass: 100%/3   | Total: 14m 47s | Avg:  4m 55s | Max:  5m 15s
      🟩 nvcc12.5           Pass: 100%/4   | Total: 31m 29s | Avg:  7m 52s | Max:  8m 17s
      🟩 nvcc12.6           Pass: 100%/84  | Total:  9h 49m | Avg:  7m 00s | Max: 30m 31s | Hits:  99%/2193  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/4   | Total: 15m 21s | Avg:  3m 50s | Max:  4m 03s
      🟩 nvcc               Pass: 100%/106 | Total: 11h 46m | Avg:  6m 40s | Max: 30m 31s | Hits:  99%/2924  
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total: 29m 43s | Avg:  4m 57s | Max:  5m 53s
      🟩 Clang10            Pass: 100%/3   | Total: 18m 15s | Avg:  6m 05s | Max:  6m 26s
      🟩 Clang11            Pass: 100%/4   | Total: 19m 46s | Avg:  4m 56s | Max:  5m 06s
      🟩 Clang12            Pass: 100%/4   | Total: 19m 15s | Avg:  4m 48s | Max:  5m 00s
      🟩 Clang13            Pass: 100%/4   | Total: 21m 51s | Avg:  5m 27s | Max:  7m 14s
      🟩 Clang14            Pass: 100%/4   | Total: 19m 17s | Avg:  4m 49s | Max:  5m 05s
      🟩 Clang15            Pass: 100%/4   | Total: 21m 54s | Avg:  5m 28s | Max:  6m 24s
      🟩 Clang16            Pass: 100%/4   | Total: 20m 03s | Avg:  5m 00s | Max:  5m 26s
      🟩 Clang17            Pass: 100%/4   | Total: 19m 44s | Avg:  4m 56s | Max:  5m 12s
      🟩 Clang18            Pass: 100%/11  | Total:  1h 33m | Avg:  8m 30s | Max: 30m 31s
      🟩 GCC6               Pass: 100%/2   | Total:  8m 05s | Avg:  4m 02s | Max:  4m 14s
      🟩 GCC7               Pass: 100%/6   | Total: 26m 37s | Avg:  4m 26s | Max:  4m 51s
      🟩 GCC8               Pass: 100%/6   | Total: 28m 59s | Avg:  4m 49s | Max:  7m 01s
      🟩 GCC9               Pass: 100%/6   | Total: 27m 22s | Avg:  4m 33s | Max:  5m 14s
      🟩 GCC10              Pass: 100%/4   | Total: 21m 53s | Avg:  5m 28s | Max:  7m 10s
      🟩 GCC11              Pass: 100%/7   | Total: 35m 25s | Avg:  5m 03s | Max:  5m 27s
      🟩 GCC12              Pass: 100%/4   | Total: 20m 54s | Avg:  5m 13s | Max:  5m 23s
      🟩 GCC13              Pass: 100%/16  | Total:  2h 53m | Avg: 10m 51s | Max: 29m 14s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 17m 00s | Avg:  5m 40s | Max:  6m 03s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 13m 15s | Avg: 13m 15s | Max: 13m 15s | Hits:  99%/731   
      🟩 MSVC14.29          Pass: 100%/2   | Total: 22m 13s | Avg: 11m 06s | Max: 11m 15s | Hits:  99%/1462  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 11m 35s | Avg: 11m 35s | Max: 11m 35s | Hits:  99%/731   
      🟩 NVHPC24.7          Pass: 100%/4   | Total: 31m 29s | Avg:  7m 52s | Max:  8m 17s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/48  | Total:  4h 43m | Avg:  5m 54s | Max: 30m 31s
      🟩 GCC                Pass: 100%/51  | Total:  5h 43m | Avg:  6m 43s | Max: 29m 14s
      🟩 Intel              Pass: 100%/3   | Total: 17m 00s | Avg:  5m 40s | Max:  6m 03s
      🟩 MSVC               Pass: 100%/4   | Total: 47m 03s | Avg: 11m 45s | Max: 13m 15s | Hits:  99%/2924  
      🟩 NVHPC              Pass: 100%/4   | Total: 31m 29s | Avg:  7m 52s | Max:  8m 17s
    🟩 gpu
      🟩 v100               Pass: 100%/110 | Total: 12h 02m | Avg:  6m 33s | Max: 30m 31s | Hits:  99%/2924  
    🟩 jobs
      🟩 Build              Pass: 100%/102 | Total:  8h 57m | Avg:  5m 16s | Max: 13m 15s | Hits:  99%/2924  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 24m 02s | Avg: 24m 02s | Max: 24m 02s
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 26s | Avg: 16m 26s | Max: 16m 26s
      🟩 HostLaunch         Pass: 100%/3   | Total: 58m 51s | Avg: 19m 37s | Max: 25m 01s
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 25m | Avg: 28m 20s | Max: 30m 31s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 14m 47s | Avg:  4m 55s | Max:  5m 15s
      🟩 90a                Pass: 100%/4   | Total: 15m 51s | Avg:  3m 57s | Max:  4m 14s
    🟩 std
      🟩 11                 Pass: 100%/30  | Total:  3h 03m | Avg:  6m 06s | Max: 25m 17s
      🟩 14                 Pass: 100%/29  | Total:  2h 35m | Avg:  5m 22s | Max: 13m 15s | Hits:  99%/1462  
      🟩 17                 Pass: 100%/27  | Total:  2h 21m | Avg:  5m 13s | Max: 11m 15s | Hits:  99%/731   
      🟩 20                 Pass: 100%/24  | Total:  4h 01m | Avg: 10m 04s | Max: 30m 31s | Hits:  99%/731   
    
  • 🟩 thrust: Pass: 100%/109 | Total: 12h 11m | Avg: 6m 42s | Max: 22m 01s | Hits: 99%/13165

    🟩 cpu
      🟩 amd64              Pass: 100%/101 | Total: 11h 31m | Avg:  6m 50s | Max: 22m 01s | Hits:  99%/13165 
      🟩 arm64              Pass: 100%/8   | Total: 40m 24s | Avg:  5m 03s | Max:  6m 07s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  1h 21m | Avg:  5m 25s | Max: 18m 08s | Hits:  99%/2633  
      🟩 11.8               Pass: 100%/3   | Total: 18m 10s | Avg:  6m 03s | Max:  7m 23s
      🟩 12.5               Pass: 100%/4   | Total:  1h 07m | Avg: 16m 58s | Max: 18m 13s
      🟩 12.6               Pass: 100%/87  | Total:  9h 24m | Avg:  6m 29s | Max: 22m 01s | Hits:  99%/10532 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/4   | Total: 20m 40s | Avg:  5m 10s | Max:  5m 23s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  1h 21m | Avg:  5m 25s | Max: 18m 08s | Hits:  99%/2633  
      🟩 nvcc11.8           Pass: 100%/3   | Total: 18m 10s | Avg:  6m 03s | Max:  7m 23s
      🟩 nvcc12.5           Pass: 100%/4   | Total:  1h 07m | Avg: 16m 58s | Max: 18m 13s
      🟩 nvcc12.6           Pass: 100%/83  | Total:  9h 03m | Avg:  6m 33s | Max: 22m 01s | Hits:  99%/10532 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/4   | Total: 20m 40s | Avg:  5m 10s | Max:  5m 23s
      🟩 nvcc               Pass: 100%/105 | Total: 11h 51m | Avg:  6m 46s | Max: 22m 01s | Hits:  99%/13165 
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total: 33m 46s | Avg:  5m 37s | Max:  6m 50s
      🟩 Clang10            Pass: 100%/3   | Total: 20m 19s | Avg:  6m 46s | Max:  7m 28s
      🟩 Clang11            Pass: 100%/4   | Total: 21m 18s | Avg:  5m 19s | Max:  5m 53s
      🟩 Clang12            Pass: 100%/4   | Total: 21m 40s | Avg:  5m 25s | Max:  5m 44s
      🟩 Clang13            Pass: 100%/4   | Total: 21m 38s | Avg:  5m 24s | Max:  5m 55s
      🟩 Clang14            Pass: 100%/4   | Total: 21m 31s | Avg:  5m 22s | Max:  5m 44s
      🟩 Clang15            Pass: 100%/4   | Total: 22m 01s | Avg:  5m 30s | Max:  6m 03s
      🟩 Clang16            Pass: 100%/4   | Total: 22m 39s | Avg:  5m 39s | Max:  6m 37s
      🟩 Clang17            Pass: 100%/4   | Total: 21m 49s | Avg:  5m 27s | Max:  6m 07s
      🟩 Clang18            Pass: 100%/11  | Total:  1h 10m | Avg:  6m 24s | Max: 17m 38s
      🟩 GCC6               Pass: 100%/2   | Total:  8m 30s | Avg:  4m 15s | Max:  4m 23s
      🟩 GCC7               Pass: 100%/6   | Total: 29m 35s | Avg:  4m 55s | Max:  5m 46s
      🟩 GCC8               Pass: 100%/6   | Total: 29m 33s | Avg:  4m 55s | Max:  5m 45s
      🟩 GCC9               Pass: 100%/6   | Total: 29m 15s | Avg:  4m 52s | Max:  5m 42s
      🟩 GCC10              Pass: 100%/4   | Total: 21m 09s | Avg:  5m 17s | Max:  5m 30s
      🟩 GCC11              Pass: 100%/7   | Total: 40m 59s | Avg:  5m 51s | Max:  7m 23s
      🟩 GCC12              Pass: 100%/4   | Total: 23m 12s | Avg:  5m 48s | Max:  6m 03s
      🟩 GCC13              Pass: 100%/14  | Total:  1h 36m | Avg:  6m 51s | Max: 16m 09s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 20m 48s | Avg:  6m 56s | Max:  7m 43s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 18m 08s | Avg: 18m 08s | Max: 18m 08s | Hits:  99%/2633  
      🟩 MSVC14.29          Pass: 100%/2   | Total: 31m 10s | Avg: 15m 35s | Max: 16m 32s | Hits:  99%/5266  
      🟩 MSVC14.39          Pass: 100%/2   | Total: 38m 28s | Avg: 19m 14s | Max: 22m 01s | Hits:  99%/5266  
      🟩 NVHPC24.7          Pass: 100%/4   | Total:  1h 07m | Avg: 16m 58s | Max: 18m 13s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/48  | Total:  4h 37m | Avg:  5m 46s | Max: 17m 38s
      🟩 GCC                Pass: 100%/49  | Total:  4h 38m | Avg:  5m 40s | Max: 16m 09s
      🟩 Intel              Pass: 100%/3   | Total: 20m 48s | Avg:  6m 56s | Max:  7m 43s
      🟩 MSVC               Pass: 100%/5   | Total:  1h 27m | Avg: 17m 33s | Max: 22m 01s | Hits:  99%/13165 
      🟩 NVHPC              Pass: 100%/4   | Total:  1h 07m | Avg: 16m 58s | Max: 18m 13s
    🟩 gpu
      🟩 v100               Pass: 100%/109 | Total: 12h 11m | Avg:  6m 42s | Max: 22m 01s | Hits:  99%/13165 
    🟩 jobs
      🟩 Build              Pass: 100%/102 | Total: 10h 39m | Avg:  6m 16s | Max: 18m 13s | Hits:  99%/10532 
      🟩 TestCPU            Pass: 100%/4   | Total: 44m 02s | Avg: 11m 00s | Max: 22m 01s | Hits:  99%/2633  
      🟩 TestGPU            Pass: 100%/3   | Total: 48m 00s | Avg: 16m 00s | Max: 17m 38s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 18m 10s | Avg:  6m 03s | Max:  7m 23s
      🟩 90a                Pass: 100%/4   | Total: 18m 26s | Avg:  4m 36s | Max:  4m 41s
    🟩 std
      🟩 11                 Pass: 100%/30  | Total:  2h 53m | Avg:  5m 47s | Max: 17m 24s
      🟩 14                 Pass: 100%/29  | Total:  3h 10m | Avg:  6m 34s | Max: 18m 08s | Hits:  99%/5266  
      🟩 17                 Pass: 100%/27  | Total:  2h 53m | Avg:  6m 25s | Max: 16m 45s | Hits:  99%/2633  
      🟩 20                 Pass: 100%/23  | Total:  3h 13m | Avg:  8m 25s | Max: 22m 01s | Hits:  99%/5266  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 10m 04s | Avg: 5m 02s | Max: 7m 53s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 10m 04s | Avg:  5m 02s | Max:  7m 53s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total: 10m 04s | Avg:  5m 02s | Max:  7m 53s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total: 10m 04s | Avg:  5m 02s | Max:  7m 53s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 10m 04s | Avg:  5m 02s | Max:  7m 53s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 10m 04s | Avg:  5m 02s | Max:  7m 53s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 10m 04s | Avg:  5m 02s | Max:  7m 53s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total: 10m 04s | Avg:  5m 02s | Max:  7m 53s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 11s | Avg:  2m 11s | Max:  2m 11s
      🟩 Test               Pass: 100%/1   | Total:  7m 53s | Avg:  7m 53s | Max:  7m 53s
    
  • 🟩 python: Pass: 100%/1 | Total: 19m 26s | Avg: 19m 26s | Max: 19m 26s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 19m 26s | Avg: 19m 26s | Max: 19m 26s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 19m 26s | Avg: 19m 26s | Max: 19m 26s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 19m 26s | Avg: 19m 26s | Max: 19m 26s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 19m 26s | Avg: 19m 26s | Max: 19m 26s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 19m 26s | Avg: 19m 26s | Max: 19m 26s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 19m 26s | Avg: 19m 26s | Max: 19m 26s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 19m 26s | Avg: 19m 26s | Max: 19m 26s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 19m 26s | Avg: 19m 26s | Max: 19m 26s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 222)

# Runner
184 linux-amd64-cpu16
16 linux-arm64-cpu16
13 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16

@bernhardmgruber bernhardmgruber merged commit 628714d into NVIDIA:main Oct 30, 2024
236 checks passed
@bernhardmgruber bernhardmgruber deleted the transform_no_for branch October 30, 2024 14:19
fbusato pushed a commit to fbusato/cccl that referenced this pull request Nov 5, 2024
We previously had a fallback algorithm that would use cub::DeviceFor. Benchmarks showed that the prefetch algorithm is always superior to that fallback, so let's remove it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cub For all items related to CUB
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants