Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify kernel dispatch paths for device reduce between CUB and c.parallel. #2591

Merged
merged 8 commits into from
Oct 23, 2024

Conversation

griwes
Copy link
Collaborator

@griwes griwes commented Oct 17, 2024

Description

This PR removes the duplicated kernel dispatch logic from c.parallel's device reduce, adapts the CUB dispatch layer to support the CUDA driver + CUfunction use case, and then replaces the removed code in c.parallel with a call to the CUB dispatch layer.

This is achieved by extending the list of arguments to DispatchReduce by two new template parameters:

  • KernelSource, which is a type which will be used by the dispatch layer to select the kernels to use; the C library will provide its own that returns the precompiled kernels, while the default kernel source will instantiate the kernels as previously; and
  • KernelLauncherFactory, which allows specifying by what method the kernels will be launched, as well as provides a method for obtaining occupancy information about the target device; the default one uses the same CUDA runtime functions as the original implementation, and the C library overrides this with one that uses CUDA driver functions directly.

Resolves #2448; the issue suggests that for should also be unified, but the dispatch layer of for is so thin that this is not worth the effort right now.

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link
Contributor

🟨 CI finished in 2h 09m: Pass: 90%/210 | Total: 5d 20h | Avg: 40m 01s | Max: 1h 14m | Hits: 68%/13095
  • 🟨 cub: Pass: 80%/104 | Total: 3d 12h | Avg: 48m 44s | Max: 1h 14m

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  79%/96  | Total:  3d 05h | Avg: 48m 13s | Max:  1h 14m
      🟩 arm64              Pass: 100%/8   | Total:  7h 19m | Avg: 54m 55s | Max:  1h 01m
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  79%/96  | Total:  3d 09h | Avg: 50m 44s | Max:  1h 14m
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 23m 04s | Avg: 23m 04s | Max: 23m 04s
      🟩 GraphCapture       Pass: 100%/1   | Total: 17m 25s | Avg: 17m 25s | Max: 17m 25s
      🟩 HostLaunch         Pass: 100%/3   | Total: 58m 03s | Avg: 19m 21s | Max: 21m 01s
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 40m | Avg: 33m 29s | Max: 38m 23s
    🟨 ctk
      🟥 11.1               Pass:   0%/15  | Total: 11h 00m | Avg: 44m 02s | Max: 48m 36s
      🟩 11.8               Pass: 100%/3   | Total:  3h 41m | Avg:  1h 13m | Max:  1h 14m
      🟨 12.6               Pass:  94%/86  | Total:  2d 21h | Avg: 48m 41s | Max:  1h 05m
    🟨 cudacxx
      🟥 ClangCUDA18        Pass:   0%/2   | Total:  1h 54m | Avg: 57m 06s | Max: 57m 52s
      🟥 nvcc11.1           Pass:   0%/15  | Total: 11h 00m | Avg: 44m 02s | Max: 48m 36s
      🟩 nvcc11.8           Pass: 100%/3   | Total:  3h 41m | Avg:  1h 13m | Max:  1h 14m
      🟨 nvcc12.6           Pass:  96%/84  | Total:  2d 19h | Avg: 48m 29s | Max:  1h 05m
    🟨 cxx
      🟨 Clang9             Pass:  50%/6   | Total:  4h 53m | Avg: 48m 57s | Max: 56m 40s
      🟩 Clang10            Pass: 100%/3   | Total:  2h 44m | Avg: 54m 58s | Max: 56m 59s
      🟩 Clang11            Pass: 100%/4   | Total:  3h 35m | Avg: 53m 46s | Max: 55m 55s
      🟩 Clang12            Pass: 100%/4   | Total:  3h 25m | Avg: 51m 19s | Max: 56m 24s
      🟩 Clang13            Pass: 100%/4   | Total:  3h 53m | Avg: 58m 15s | Max:  1h 05m
      🟩 Clang14            Pass: 100%/4   | Total:  3h 30m | Avg: 52m 35s | Max: 57m 09s
      🟩 Clang15            Pass: 100%/4   | Total:  3h 38m | Avg: 54m 44s | Max: 56m 28s
      🟩 Clang16            Pass: 100%/4   | Total:  3h 23m | Avg: 50m 49s | Max: 53m 54s
      🟩 Clang17            Pass: 100%/4   | Total:  3h 28m | Avg: 52m 11s | Max: 55m 03s
      🟨 Clang18            Pass:  77%/9   | Total:  7h 14m | Avg: 48m 13s | Max: 57m 52s
      🟥 GCC6               Pass:   0%/2   | Total:  1h 32m | Avg: 46m 27s | Max: 48m 36s
      🟨 GCC7               Pass:  50%/6   | Total:  5h 08m | Avg: 51m 22s | Max: 57m 30s
      🟨 GCC8               Pass:  50%/6   | Total:  4h 45m | Avg: 47m 33s | Max: 51m 03s
      🟨 GCC9               Pass:  50%/6   | Total:  4h 56m | Avg: 49m 23s | Max: 57m 49s
      🟩 GCC10              Pass: 100%/4   | Total:  3h 38m | Avg: 54m 37s | Max: 56m 43s
      🟩 GCC11              Pass: 100%/7   | Total:  7h 15m | Avg:  1h 02m | Max:  1h 14m
      🟩 GCC12              Pass: 100%/4   | Total:  3h 36m | Avg: 54m 00s | Max: 57m 43s
      🟩 GCC13              Pass: 100%/16  | Total:  9h 32m | Avg: 35m 47s | Max:  1h 01m
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 50m | Avg: 56m 50s | Max: 59m 59s
      🟥 MSVC14.16          Pass:   0%/1   | Total: 22m 28s | Avg: 22m 28s | Max: 22m 28s
      🟥 MSVC14.29          Pass:   0%/2   | Total: 41m 42s | Avg: 20m 51s | Max: 20m 59s
      🟥 MSVC14.39          Pass:   0%/1   | Total: 21m 42s | Avg: 21m 42s | Max: 21m 42s
    🟨 cxx_family
      🟨 Clang              Pass:  89%/46  | Total:  1d 15h | Avg: 51m 53s | Max:  1h 05m
      🟨 GCC                Pass:  78%/51  | Total:  1d 16h | Avg: 47m 33s | Max:  1h 14m
      🟩 Intel              Pass: 100%/3   | Total:  2h 50m | Avg: 56m 50s | Max: 59m 59s
      🟥 MSVC               Pass:   0%/4   | Total:  1h 25m | Avg: 21m 28s | Max: 22m 28s
    🟨 gpu
      🟨 v100               Pass:  80%/104 | Total:  3d 12h | Avg: 48m 44s | Max:  1h 14m
    🟨 cudacxx_family
      🟥 ClangCUDA          Pass:   0%/2   | Total:  1h 54m | Avg: 57m 06s | Max: 57m 52s
      🟨 nvcc               Pass:  82%/102 | Total:  3d 10h | Avg: 48m 34s | Max:  1h 14m
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  3h 41m | Avg:  1h 13m | Max:  1h 14m
      🟩 90a                Pass: 100%/4   | Total:  1h 34m | Avg: 23m 38s | Max: 24m 39s
    🟨 std
      🟨 11                 Pass:  82%/28  | Total: 23h 27m | Avg: 50m 15s | Max:  1h 14m
      🟨 14                 Pass:  74%/27  | Total: 22h 14m | Avg: 49m 25s | Max:  1h 12m
      🟨 17                 Pass:  76%/26  | Total: 21h 55m | Avg: 50m 35s | Max:  1h 14m
      🟨 20                 Pass:  91%/23  | Total: 16h 52m | Avg: 44m 01s | Max:  1h 05m
    
  • 🟥 pycuda: Pass: 0%/1 | Total: 14m 44s | Avg: 14m 44s | Max: 14m 44s

    🟥 cpu
      🟥 amd64              Pass:   0%/1   | Total: 14m 44s | Avg: 14m 44s | Max: 14m 44s
    🟥 ctk
      🟥 12.5               Pass:   0%/1   | Total: 14m 44s | Avg: 14m 44s | Max: 14m 44s
    🟥 cudacxx
      🟥 nvcc12.5           Pass:   0%/1   | Total: 14m 44s | Avg: 14m 44s | Max: 14m 44s
    🟥 cudacxx_family
      🟥 nvcc               Pass:   0%/1   | Total: 14m 44s | Avg: 14m 44s | Max: 14m 44s
    🟥 cxx
      🟥 GCC13              Pass:   0%/1   | Total: 14m 44s | Avg: 14m 44s | Max: 14m 44s
    🟥 cxx_family
      🟥 GCC                Pass:   0%/1   | Total: 14m 44s | Avg: 14m 44s | Max: 14m 44s
    🟥 gpu
      🟥 v100               Pass:   0%/1   | Total: 14m 44s | Avg: 14m 44s | Max: 14m 44s
    🟥 jobs
      🟥 Test               Pass:   0%/1   | Total: 14m 44s | Avg: 14m 44s | Max: 14m 44s
    
  • 🟩 thrust: Pass: 100%/103 | Total: 2d 07h | Avg: 32m 08s | Max: 1h 08m | Hits: 68%/13095

    🟩 cpu
      🟩 amd64              Pass: 100%/95  | Total:  2d 03h | Avg: 32m 17s | Max:  1h 08m | Hits:  68%/13095 
      🟩 arm64              Pass: 100%/8   | Total:  4h 02m | Avg: 30m 22s | Max: 34m 29s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  8h 18m | Avg: 33m 12s | Max:  1h 08m | Hits:  61%/2619  
      🟩 11.8               Pass: 100%/3   | Total:  2h 00m | Avg: 40m 03s | Max: 42m 53s
      🟩 12.6               Pass: 100%/85  | Total:  1d 20h | Avg: 31m 40s | Max:  1h 06m | Hits:  70%/10476 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 56m 01s | Avg: 28m 00s | Max: 29m 52s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  8h 18m | Avg: 33m 12s | Max:  1h 08m | Hits:  61%/2619  
      🟩 nvcc11.8           Pass: 100%/3   | Total:  2h 00m | Avg: 40m 03s | Max: 42m 53s
      🟩 nvcc12.6           Pass: 100%/83  | Total:  1d 19h | Avg: 31m 45s | Max:  1h 06m | Hits:  70%/10476 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 56m 01s | Avg: 28m 00s | Max: 29m 52s
      🟩 nvcc               Pass: 100%/101 | Total:  2d 06h | Avg: 32m 13s | Max:  1h 08m | Hits:  68%/13095 
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  3h 11m | Avg: 31m 55s | Max: 39m 16s
      🟩 Clang10            Pass: 100%/3   | Total:  1h 44m | Avg: 34m 57s | Max: 39m 10s
      🟩 Clang11            Pass: 100%/4   | Total:  2h 01m | Avg: 30m 29s | Max: 33m 49s
      🟩 Clang12            Pass: 100%/4   | Total:  2h 10m | Avg: 32m 41s | Max: 37m 29s
      🟩 Clang13            Pass: 100%/4   | Total:  2h 06m | Avg: 31m 42s | Max: 35m 46s
      🟩 Clang14            Pass: 100%/4   | Total:  2h 17m | Avg: 34m 17s | Max: 37m 15s
      🟩 Clang15            Pass: 100%/4   | Total:  2h 07m | Avg: 31m 50s | Max: 34m 32s
      🟩 Clang16            Pass: 100%/4   | Total:  2h 05m | Avg: 31m 29s | Max: 35m 20s
      🟩 Clang17            Pass: 100%/4   | Total:  2h 12m | Avg: 33m 10s | Max: 37m 37s
      🟩 Clang18            Pass: 100%/9   | Total:  3h 47m | Avg: 25m 19s | Max: 33m 07s
      🟩 GCC6               Pass: 100%/2   | Total: 57m 31s | Avg: 28m 45s | Max: 33m 32s
      🟩 GCC7               Pass: 100%/6   | Total:  3h 16m | Avg: 32m 49s | Max: 38m 22s
      🟩 GCC8               Pass: 100%/6   | Total:  3h 09m | Avg: 31m 35s | Max: 38m 53s
      🟩 GCC9               Pass: 100%/6   | Total:  3h 08m | Avg: 31m 24s | Max: 34m 20s
      🟩 GCC10              Pass: 100%/4   | Total:  2h 10m | Avg: 32m 37s | Max: 37m 32s
      🟩 GCC11              Pass: 100%/7   | Total:  4h 23m | Avg: 37m 38s | Max: 42m 53s
      🟩 GCC12              Pass: 100%/4   | Total:  2h 12m | Avg: 33m 06s | Max: 36m 47s
      🟩 GCC13              Pass: 100%/14  | Total:  5h 20m | Avg: 22m 53s | Max: 36m 35s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 04m | Avg: 41m 24s | Max: 43m 49s
      🟩 MSVC14.16          Pass: 100%/1   | Total:  1h 08m | Avg:  1h 08m | Max:  1h 08m | Hits:  61%/2619  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 59m | Avg: 59m 45s | Max:  1h 02m | Hits:  61%/5238  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 31m | Avg: 45m 56s | Max:  1h 06m | Hits:  80%/5238  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/46  | Total: 23h 46m | Avg: 31m 01s | Max: 39m 16s
      🟩 GCC                Pass: 100%/49  | Total:  1d 00h | Avg: 30m 11s | Max: 42m 53s
      🟩 Intel              Pass: 100%/3   | Total:  2h 04m | Avg: 41m 24s | Max: 43m 49s
      🟩 MSVC               Pass: 100%/5   | Total:  4h 39m | Avg: 55m 59s | Max:  1h 08m | Hits:  68%/13095 
    🟩 gpu
      🟩 v100               Pass: 100%/103 | Total:  2d 07h | Avg: 32m 08s | Max:  1h 08m | Hits:  68%/13095 
    🟩 jobs
      🟩 Build              Pass: 100%/96  | Total:  2d 05h | Avg: 33m 31s | Max:  1h 08m | Hits:  61%/10476 
      🟩 TestCPU            Pass: 100%/4   | Total: 50m 03s | Avg: 12m 30s | Max: 25m 03s | Hits:  99%/2619  
      🟩 TestGPU            Pass: 100%/3   | Total: 41m 49s | Avg: 13m 56s | Max: 15m 00s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  2h 00m | Avg: 40m 03s | Max: 42m 53s
      🟩 90a                Pass: 100%/4   | Total:  1h 21m | Avg: 20m 28s | Max: 23m 12s
    🟩 std
      🟩 11                 Pass: 100%/28  | Total: 12h 11m | Avg: 26m 07s | Max: 38m 47s
      🟩 14                 Pass: 100%/27  | Total: 16h 01m | Avg: 35m 36s | Max:  1h 08m | Hits:  61%/5238  
      🟩 17                 Pass: 100%/26  | Total: 15h 37m | Avg: 36m 02s | Max:  1h 02m | Hits:  61%/2619  
      🟩 20                 Pass: 100%/22  | Total: 11h 20m | Avg: 30m 55s | Max:  1h 06m | Hits:  80%/5238  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 11m 23s | Avg: 5m 41s | Max: 8m 37s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 11m 23s | Avg:  5m 41s | Max:  8m 37s
    🟩 ctk
      🟩 12.5               Pass: 100%/2   | Total: 11m 23s | Avg:  5m 41s | Max:  8m 37s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/2   | Total: 11m 23s | Avg:  5m 41s | Max:  8m 37s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 11m 23s | Avg:  5m 41s | Max:  8m 37s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 11m 23s | Avg:  5m 41s | Max:  8m 37s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 11m 23s | Avg:  5m 41s | Max:  8m 37s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total: 11m 23s | Avg:  5m 41s | Max:  8m 37s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 46s | Avg:  2m 46s | Max:  2m 46s
      🟩 Test               Pass: 100%/1   | Total:  8m 37s | Avg:  8m 37s | Max:  8m 37s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
pycuda
+/- CCCL C Parallel Library

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda
+/- CCCL C Parallel Library

🏃‍ Runner counts (total jobs: 210)

# Runner
172 linux-amd64-cpu16
16 linux-arm64-cpu16
13 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16

Copy link
Contributor

🟨 CI finished in 49m 49s: Pass: 99%/210 | Total: 22h 17m | Avg: 6m 22s | Max: 26m 05s | Hits: 99%/16011
  • 🟥 pycuda: Pass: 0%/1 | Total: 15m 25s | Avg: 15m 25s | Max: 15m 25s

    🟥 cpu
      🟥 amd64              Pass:   0%/1   | Total: 15m 25s | Avg: 15m 25s | Max: 15m 25s
    🟥 ctk
      🟥 12.5               Pass:   0%/1   | Total: 15m 25s | Avg: 15m 25s | Max: 15m 25s
    🟥 cudacxx
      🟥 nvcc12.5           Pass:   0%/1   | Total: 15m 25s | Avg: 15m 25s | Max: 15m 25s
    🟥 cudacxx_family
      🟥 nvcc               Pass:   0%/1   | Total: 15m 25s | Avg: 15m 25s | Max: 15m 25s
    🟥 cxx
      🟥 GCC13              Pass:   0%/1   | Total: 15m 25s | Avg: 15m 25s | Max: 15m 25s
    🟥 cxx_family
      🟥 GCC                Pass:   0%/1   | Total: 15m 25s | Avg: 15m 25s | Max: 15m 25s
    🟥 gpu
      🟥 v100               Pass:   0%/1   | Total: 15m 25s | Avg: 15m 25s | Max: 15m 25s
    🟥 jobs
      🟥 Test               Pass:   0%/1   | Total: 15m 25s | Avg: 15m 25s | Max: 15m 25s
    
  • 🟩 cub: Pass: 100%/104 | Total: 11h 01m | Avg: 6m 21s | Max: 25m 31s | Hits: 99%/2916

    🟩 cpu
      🟩 amd64              Pass: 100%/96  | Total: 10h 25m | Avg:  6m 30s | Max: 25m 31s | Hits:  99%/2916  
      🟩 arm64              Pass: 100%/8   | Total: 35m 45s | Avg:  4m 28s | Max:  5m 40s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  1h 13m | Avg:  4m 55s | Max: 15m 53s | Hits:  99%/729   
      🟩 11.8               Pass: 100%/3   | Total: 14m 52s | Avg:  4m 57s | Max:  5m 14s
      🟩 12.6               Pass: 100%/86  | Total:  9h 32m | Avg:  6m 39s | Max: 25m 31s | Hits:  98%/2187  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  8m 07s | Avg:  4m 03s | Max:  4m 15s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  1h 13m | Avg:  4m 55s | Max: 15m 53s | Hits:  99%/729   
      🟩 nvcc11.8           Pass: 100%/3   | Total: 14m 52s | Avg:  4m 57s | Max:  5m 14s
      🟩 nvcc12.6           Pass: 100%/84  | Total:  9h 24m | Avg:  6m 43s | Max: 25m 31s | Hits:  98%/2187  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  8m 07s | Avg:  4m 03s | Max:  4m 15s
      🟩 nvcc               Pass: 100%/102 | Total: 10h 52m | Avg:  6m 24s | Max: 25m 31s | Hits:  99%/2916  
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total: 29m 09s | Avg:  4m 51s | Max:  5m 30s
      🟩 Clang10            Pass: 100%/3   | Total: 16m 51s | Avg:  5m 37s | Max:  5m 56s
      🟩 Clang11            Pass: 100%/4   | Total: 19m 03s | Avg:  4m 45s | Max:  4m 52s
      🟩 Clang12            Pass: 100%/4   | Total: 18m 11s | Avg:  4m 32s | Max:  4m 42s
      🟩 Clang13            Pass: 100%/4   | Total: 18m 33s | Avg:  4m 38s | Max:  4m 45s
      🟩 Clang14            Pass: 100%/4   | Total: 19m 06s | Avg:  4m 46s | Max:  4m 55s
      🟩 Clang15            Pass: 100%/4   | Total: 19m 54s | Avg:  4m 58s | Max:  5m 05s
      🟩 Clang16            Pass: 100%/4   | Total: 19m 22s | Avg:  4m 50s | Max:  5m 10s
      🟩 Clang17            Pass: 100%/4   | Total: 19m 13s | Avg:  4m 48s | Max:  5m 10s
      🟩 Clang18            Pass: 100%/9   | Total:  1h 19m | Avg:  8m 52s | Max: 25m 31s
      🟩 GCC6               Pass: 100%/2   | Total:  7m 38s | Avg:  3m 49s | Max:  3m 52s
      🟩 GCC7               Pass: 100%/6   | Total: 26m 52s | Avg:  4m 28s | Max:  4m 44s
      🟩 GCC8               Pass: 100%/6   | Total: 25m 12s | Avg:  4m 12s | Max:  4m 34s
      🟩 GCC9               Pass: 100%/6   | Total: 26m 51s | Avg:  4m 28s | Max:  5m 15s
      🟩 GCC10              Pass: 100%/4   | Total: 18m 58s | Avg:  4m 44s | Max:  5m 06s
      🟩 GCC11              Pass: 100%/7   | Total: 33m 54s | Avg:  4m 50s | Max:  5m 14s
      🟩 GCC12              Pass: 100%/4   | Total: 19m 30s | Avg:  4m 52s | Max:  4m 58s
      🟩 GCC13              Pass: 100%/16  | Total:  2h 46m | Avg: 10m 23s | Max: 25m 01s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 16m 50s | Avg:  5m 36s | Max:  5m 50s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 15m 53s | Avg: 15m 53s | Max: 15m 53s | Hits:  99%/729   
      🟩 MSVC14.29          Pass: 100%/2   | Total: 28m 52s | Avg: 14m 26s | Max: 14m 31s | Hits:  98%/1458  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 15m 12s | Avg: 15m 12s | Max: 15m 12s | Hits:  99%/729   
    🟩 cxx_family
      🟩 Clang              Pass: 100%/46  | Total:  4h 19m | Avg:  5m 38s | Max: 25m 31s
      🟩 GCC                Pass: 100%/51  | Total:  5h 25m | Avg:  6m 22s | Max: 25m 01s
      🟩 Intel              Pass: 100%/3   | Total: 16m 50s | Avg:  5m 36s | Max:  5m 50s
      🟩 MSVC               Pass: 100%/4   | Total: 59m 57s | Avg: 14m 59s | Max: 15m 53s | Hits:  99%/2916  
    🟩 gpu
      🟩 v100               Pass: 100%/104 | Total: 11h 01m | Avg:  6m 21s | Max: 25m 31s | Hits:  99%/2916  
    🟩 jobs
      🟩 Build              Pass: 100%/96  | Total:  8h 09m | Avg:  5m 06s | Max: 15m 53s | Hits:  99%/2916  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 20m 17s | Avg: 20m 17s | Max: 20m 17s
      🟩 GraphCapture       Pass: 100%/1   | Total: 17m 16s | Avg: 17m 16s | Max: 17m 16s
      🟩 HostLaunch         Pass: 100%/3   | Total: 59m 12s | Avg: 19m 44s | Max: 22m 38s
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 14m | Avg: 24m 52s | Max: 25m 31s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 14m 52s | Avg:  4m 57s | Max:  5m 14s
      🟩 90a                Pass: 100%/4   | Total: 15m 34s | Avg:  3m 53s | Max:  4m 22s
    🟩 std
      🟩 11                 Pass: 100%/28  | Total:  2h 45m | Avg:  5m 55s | Max: 24m 06s
      🟩 14                 Pass: 100%/27  | Total:  2h 23m | Avg:  5m 19s | Max: 15m 53s | Hits:  98%/1458  
      🟩 17                 Pass: 100%/26  | Total:  2h 14m | Avg:  5m 11s | Max: 14m 21s | Hits:  99%/729   
      🟩 20                 Pass: 100%/23  | Total:  3h 36m | Avg:  9m 25s | Max: 25m 31s | Hits:  99%/729   
    
  • 🟩 thrust: Pass: 100%/103 | Total: 10h 48m | Avg: 6m 17s | Max: 26m 05s | Hits: 99%/13095

    🟩 cpu
      🟩 amd64              Pass: 100%/95  | Total: 10h 09m | Avg:  6m 25s | Max: 26m 05s | Hits:  99%/13095 
      🟩 arm64              Pass: 100%/8   | Total: 38m 58s | Avg:  4m 52s | Max:  5m 36s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  1h 24m | Avg:  5m 38s | Max: 21m 21s | Hits:  99%/2619  
      🟩 11.8               Pass: 100%/3   | Total: 15m 36s | Avg:  5m 12s | Max:  5m 55s
      🟩 12.6               Pass: 100%/85  | Total:  9h 08m | Avg:  6m 27s | Max: 26m 05s | Hits:  99%/10476 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 17s | Avg:  5m 08s | Max:  5m 20s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  1h 24m | Avg:  5m 38s | Max: 21m 21s | Hits:  99%/2619  
      🟩 nvcc11.8           Pass: 100%/3   | Total: 15m 36s | Avg:  5m 12s | Max:  5m 55s
      🟩 nvcc12.6           Pass: 100%/83  | Total:  8h 58m | Avg:  6m 28s | Max: 26m 05s | Hits:  99%/10476 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 17s | Avg:  5m 08s | Max:  5m 20s
      🟩 nvcc               Pass: 100%/101 | Total: 10h 38m | Avg:  6m 19s | Max: 26m 05s | Hits:  99%/13095 
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total: 32m 46s | Avg:  5m 27s | Max:  6m 29s
      🟩 Clang10            Pass: 100%/3   | Total: 19m 23s | Avg:  6m 27s | Max:  6m 38s
      🟩 Clang11            Pass: 100%/4   | Total: 20m 08s | Avg:  5m 02s | Max:  5m 23s
      🟩 Clang12            Pass: 100%/4   | Total: 20m 25s | Avg:  5m 06s | Max:  5m 31s
      🟩 Clang13            Pass: 100%/4   | Total: 21m 31s | Avg:  5m 22s | Max:  5m 48s
      🟩 Clang14            Pass: 100%/4   | Total: 22m 30s | Avg:  5m 37s | Max:  6m 13s
      🟩 Clang15            Pass: 100%/4   | Total: 21m 27s | Avg:  5m 21s | Max:  5m 49s
      🟩 Clang16            Pass: 100%/4   | Total: 21m 26s | Avg:  5m 21s | Max:  5m 55s
      🟩 Clang17            Pass: 100%/4   | Total: 22m 13s | Avg:  5m 33s | Max:  5m 59s
      🟩 Clang18            Pass: 100%/9   | Total: 54m 10s | Avg:  6m 01s | Max: 12m 10s
      🟩 GCC6               Pass: 100%/2   | Total:  9m 27s | Avg:  4m 43s | Max:  5m 04s
      🟩 GCC7               Pass: 100%/6   | Total: 27m 48s | Avg:  4m 38s | Max:  5m 22s
      🟩 GCC8               Pass: 100%/6   | Total: 28m 42s | Avg:  4m 47s | Max:  5m 17s
      🟩 GCC9               Pass: 100%/6   | Total: 29m 11s | Avg:  4m 51s | Max:  6m 04s
      🟩 GCC10              Pass: 100%/4   | Total: 20m 49s | Avg:  5m 12s | Max:  5m 59s
      🟩 GCC11              Pass: 100%/7   | Total: 36m 30s | Avg:  5m 12s | Max:  5m 55s
      🟩 GCC12              Pass: 100%/4   | Total: 22m 24s | Avg:  5m 36s | Max:  5m 55s
      🟩 GCC13              Pass: 100%/14  | Total:  1h 35m | Avg:  6m 48s | Max: 13m 56s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 19m 40s | Avg:  6m 33s | Max:  6m 55s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 21m 21s | Avg: 21m 21s | Max: 21m 21s | Hits:  99%/2619  
      🟩 MSVC14.29          Pass: 100%/2   | Total: 36m 29s | Avg: 18m 14s | Max: 19m 20s | Hits:  99%/5238  
      🟩 MSVC14.39          Pass: 100%/2   | Total: 45m 01s | Avg: 22m 30s | Max: 26m 05s | Hits:  99%/5238  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/46  | Total:  4h 15m | Avg:  5m 33s | Max: 12m 10s
      🟩 GCC                Pass: 100%/49  | Total:  4h 30m | Avg:  5m 30s | Max: 13m 56s
      🟩 Intel              Pass: 100%/3   | Total: 19m 40s | Avg:  6m 33s | Max:  6m 55s
      🟩 MSVC               Pass: 100%/5   | Total:  1h 42m | Avg: 20m 34s | Max: 26m 05s | Hits:  99%/13095 
    🟩 gpu
      🟩 v100               Pass: 100%/103 | Total: 10h 48m | Avg:  6m 17s | Max: 26m 05s | Hits:  99%/13095 
    🟩 jobs
      🟩 Build              Pass: 100%/96  | Total:  9h 17m | Avg:  5m 48s | Max: 21m 21s | Hits:  99%/10476 
      🟩 TestCPU            Pass: 100%/4   | Total: 52m 24s | Avg: 13m 06s | Max: 26m 05s | Hits:  99%/2619  
      🟩 TestGPU            Pass: 100%/3   | Total: 39m 07s | Avg: 13m 02s | Max: 13m 56s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 15m 36s | Avg:  5m 12s | Max:  5m 55s
      🟩 90a                Pass: 100%/4   | Total: 17m 55s | Avg:  4m 28s | Max:  4m 59s
    🟩 std
      🟩 11                 Pass: 100%/28  | Total:  2h 28m | Avg:  5m 19s | Max: 13m 01s
      🟩 14                 Pass: 100%/27  | Total:  2h 46m | Avg:  6m 10s | Max: 21m 21s | Hits:  99%/5238  
      🟩 17                 Pass: 100%/26  | Total:  2h 37m | Avg:  6m 04s | Max: 19m 20s | Hits:  99%/2619  
      🟩 20                 Pass: 100%/22  | Total:  2h 55m | Avg:  7m 57s | Max: 26m 05s | Hits:  99%/5238  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 11m 57s | Avg: 5m 58s | Max: 9m 37s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 11m 57s | Avg:  5m 58s | Max:  9m 37s
    🟩 ctk
      🟩 12.5               Pass: 100%/2   | Total: 11m 57s | Avg:  5m 58s | Max:  9m 37s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/2   | Total: 11m 57s | Avg:  5m 58s | Max:  9m 37s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 11m 57s | Avg:  5m 58s | Max:  9m 37s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 11m 57s | Avg:  5m 58s | Max:  9m 37s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 11m 57s | Avg:  5m 58s | Max:  9m 37s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total: 11m 57s | Avg:  5m 58s | Max:  9m 37s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 20s | Avg:  2m 20s | Max:  2m 20s
      🟩 Test               Pass: 100%/1   | Total:  9m 37s | Avg:  9m 37s | Max:  9m 37s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
pycuda
+/- CCCL C Parallel Library

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda
+/- CCCL C Parallel Library

🏃‍ Runner counts (total jobs: 210)

# Runner
172 linux-amd64-cpu16
16 linux-arm64-cpu16
13 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16

Copy link
Contributor

🟩 CI finished in 1h 44m: Pass: 100%/372 | Total: 5d 19h | Avg: 22m 32s | Max: 1h 15m | Hits: 57%/27963
  • 🟩 cub: Pass: 100%/104 | Total: 3d 00h | Avg: 41m 32s | Max: 1h 15m | Hits: 2%/2916

    🟩 cpu
      🟩 amd64              Pass: 100%/96  | Total:  2d 18h | Avg: 41m 28s | Max:  1h 15m | Hits:   2%/2916  
      🟩 arm64              Pass: 100%/8   | Total:  5h 38m | Avg: 42m 22s | Max: 56m 07s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  9h 35m | Avg: 38m 22s | Max: 57m 59s | Hits:   2%/729   
      🟩 11.8               Pass: 100%/3   | Total:  2h 38m | Avg: 52m 58s | Max:  1h 15m
      🟩 12.6               Pass: 100%/86  | Total:  2d 11h | Avg: 41m 41s | Max:  1h 07m | Hits:   2%/2187  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 00m
      🟩 nvcc11.1           Pass: 100%/15  | Total:  9h 35m | Avg: 38m 22s | Max: 57m 59s | Hits:   2%/729   
      🟩 nvcc11.8           Pass: 100%/3   | Total:  2h 38m | Avg: 52m 58s | Max:  1h 15m
      🟩 nvcc12.6           Pass: 100%/84  | Total:  2d 09h | Avg: 41m 14s | Max:  1h 07m | Hits:   2%/2187  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 00m
      🟩 nvcc               Pass: 100%/102 | Total:  2d 21h | Avg: 41m 09s | Max:  1h 15m | Hits:   2%/2916  
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  4h 49m | Avg: 48m 18s | Max: 53m 29s
      🟩 Clang10            Pass: 100%/3   | Total:  2h 30m | Avg: 50m 11s | Max: 51m 35s
      🟩 Clang11            Pass: 100%/4   | Total:  3h 31m | Avg: 52m 52s | Max: 56m 29s
      🟩 Clang12            Pass: 100%/4   | Total:  3h 27m | Avg: 51m 51s | Max: 56m 45s
      🟩 Clang13            Pass: 100%/4   | Total:  3h 20m | Avg: 50m 01s | Max: 52m 05s
      🟩 Clang14            Pass: 100%/4   | Total:  2h 43m | Avg: 40m 53s | Max: 55m 16s
      🟩 Clang15            Pass: 100%/4   | Total:  2h 42m | Avg: 40m 40s | Max: 54m 50s
      🟩 Clang16            Pass: 100%/4   | Total:  2h 37m | Avg: 39m 22s | Max: 49m 41s
      🟩 Clang17            Pass: 100%/4   | Total:  2h 53m | Avg: 43m 26s | Max:  1h 00m
      🟩 Clang18            Pass: 100%/9   | Total:  6h 19m | Avg: 42m 09s | Max:  1h 00m
      🟩 GCC6               Pass: 100%/2   | Total:  1h 29m | Avg: 44m 33s | Max: 45m 07s
      🟩 GCC7               Pass: 100%/6   | Total:  3h 26m | Avg: 34m 22s | Max: 51m 54s
      🟩 GCC8               Pass: 100%/6   | Total:  3h 22m | Avg: 33m 49s | Max: 49m 53s
      🟩 GCC9               Pass: 100%/6   | Total:  3h 28m | Avg: 34m 43s | Max: 50m 11s
      🟩 GCC10              Pass: 100%/4   | Total:  2h 47m | Avg: 41m 55s | Max: 55m 58s
      🟩 GCC11              Pass: 100%/7   | Total:  5h 20m | Avg: 45m 49s | Max:  1h 15m
      🟩 GCC12              Pass: 100%/4   | Total:  2h 56m | Avg: 44m 09s | Max: 58m 12s
      🟩 GCC13              Pass: 100%/16  | Total:  7h 05m | Avg: 26m 36s | Max: 56m 07s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 53m | Avg: 57m 49s | Max:  1h 00m
      🟩 MSVC14.16          Pass: 100%/1   | Total: 57m 59s | Avg: 57m 59s | Max: 57m 59s | Hits:   2%/729   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 04m | Hits:   2%/1458  
      🟩 MSVC14.39          Pass: 100%/1   | Total:  1h 07m | Avg:  1h 07m | Max:  1h 07m | Hits:   2%/729   
    🟩 cxx_family
      🟩 Clang              Pass: 100%/46  | Total:  1d 10h | Avg: 45m 34s | Max:  1h 00m
      🟩 GCC                Pass: 100%/51  | Total:  1d 05h | Avg: 35m 14s | Max:  1h 15m
      🟩 Intel              Pass: 100%/3   | Total:  2h 53m | Avg: 57m 49s | Max:  1h 00m
      🟩 MSVC               Pass: 100%/4   | Total:  4h 13m | Avg:  1h 03m | Max:  1h 07m | Hits:   2%/2916  
    🟩 gpu
      🟩 v100               Pass: 100%/104 | Total:  3d 00h | Avg: 41m 32s | Max:  1h 15m | Hits:   2%/2916  
    🟩 jobs
      🟩 Build              Pass: 100%/96  | Total:  2d 21h | Avg: 43m 18s | Max:  1h 15m | Hits:   2%/2916  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 18m 22s | Avg: 18m 22s | Max: 18m 22s
      🟩 GraphCapture       Pass: 100%/1   | Total: 15m 42s | Avg: 15m 42s | Max: 15m 42s
      🟩 HostLaunch         Pass: 100%/3   | Total: 53m 57s | Avg: 17m 59s | Max: 19m 01s
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 14m | Avg: 24m 56s | Max: 32m 35s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  2h 38m | Avg: 52m 58s | Max:  1h 15m
      🟩 90a                Pass: 100%/4   | Total:  1h 12m | Avg: 18m 06s | Max: 24m 28s
    🟩 std
      🟩 11                 Pass: 100%/28  | Total:  9h 59m | Avg: 21m 23s | Max: 59m 44s
      🟩 14                 Pass: 100%/27  | Total: 22h 47m | Avg: 50m 39s | Max:  1h 13m | Hits:   2%/1458  
      🟩 17                 Pass: 100%/26  | Total: 22h 32m | Avg: 52m 01s | Max:  1h 15m | Hits:   2%/729   
      🟩 20                 Pass: 100%/23  | Total: 16h 41m | Avg: 43m 31s | Max:  1h 07m | Hits:   2%/729   
    
  • 🟩 libcudacxx: Pass: 100%/104 | Total: 15h 56m | Avg: 9m 11s | Max: 43m 17s | Hits: 57%/11736

    🟩 cpu
      🟩 amd64              Pass: 100%/96  | Total: 15h 08m | Avg:  9m 27s | Max: 43m 17s | Hits:  57%/11736 
      🟩 arm64              Pass: 100%/8   | Total: 48m 36s | Avg:  6m 04s | Max:  7m 15s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  1h 55m | Avg:  7m 43s | Max: 34m 32s | Hits:  48%/2737  
      🟩 11.8               Pass: 100%/3   | Total: 24m 45s | Avg:  8m 15s | Max:  8m 51s
      🟩 12.6               Pass: 100%/86  | Total: 13h 36m | Avg:  9m 29s | Max: 43m 17s | Hits:  60%/8999  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 39m 39s | Avg: 19m 49s | Max: 22m 04s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  1h 55m | Avg:  7m 43s | Max: 34m 32s | Hits:  48%/2737  
      🟩 nvcc11.8           Pass: 100%/3   | Total: 24m 45s | Avg:  8m 15s | Max:  8m 51s
      🟩 nvcc12.6           Pass: 100%/84  | Total: 12h 56m | Avg:  9m 14s | Max: 43m 17s | Hits:  60%/8999  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 39m 39s | Avg: 19m 49s | Max: 22m 04s
      🟩 nvcc               Pass: 100%/102 | Total: 15h 16m | Avg:  8m 59s | Max: 43m 17s | Hits:  57%/11736 
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total: 39m 15s | Avg:  6m 32s | Max:  9m 25s
      🟩 Clang10            Pass: 100%/3   | Total: 19m 30s | Avg:  6m 30s | Max:  8m 08s
      🟩 Clang11            Pass: 100%/4   | Total: 29m 11s | Avg:  7m 17s | Max:  8m 00s
      🟩 Clang12            Pass: 100%/4   | Total: 23m 16s | Avg:  5m 49s | Max:  7m 14s
      🟩 Clang13            Pass: 100%/4   | Total: 23m 54s | Avg:  5m 58s | Max:  7m 52s
      🟩 Clang14            Pass: 100%/4   | Total: 24m 04s | Avg:  6m 01s | Max:  7m 40s
      🟩 Clang15            Pass: 100%/4   | Total: 27m 30s | Avg:  6m 52s | Max:  7m 20s
      🟩 Clang16            Pass: 100%/4   | Total: 25m 52s | Avg:  6m 28s | Max:  7m 19s
      🟩 Clang17            Pass: 100%/4   | Total: 22m 00s | Avg:  5m 30s | Max:  7m 08s
      🟩 Clang18            Pass: 100%/8   | Total:  1h 37m | Avg: 12m 12s | Max: 29m 06s
      🟩 GCC6               Pass: 100%/2   | Total:  9m 21s | Avg:  4m 40s | Max:  6m 28s
      🟩 GCC7               Pass: 100%/6   | Total: 45m 01s | Avg:  7m 30s | Max:  8m 56s
      🟩 GCC8               Pass: 100%/6   | Total: 35m 44s | Avg:  5m 57s | Max:  7m 11s
      🟩 GCC9               Pass: 100%/6   | Total: 50m 28s | Avg:  8m 24s | Max: 25m 03s
      🟩 GCC10              Pass: 100%/4   | Total: 24m 57s | Avg:  6m 14s | Max:  7m 18s
      🟩 GCC11              Pass: 100%/7   | Total: 52m 23s | Avg:  7m 29s | Max:  8m 51s
      🟩 GCC12              Pass: 100%/4   | Total: 23m 38s | Avg:  5m 54s | Max:  7m 33s
      🟩 GCC13              Pass: 100%/17  | Total:  3h 30m | Avg: 12m 22s | Max: 42m 22s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 42m 31s | Avg: 14m 10s | Max: 30m 39s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 34m 32s | Avg: 34m 32s | Max: 34m 32s | Hits:  48%/2737  
      🟩 MSVC14.29          Pass: 100%/2   | Total: 52m 22s | Avg: 26m 11s | Max: 38m 38s | Hits:  70%/5835  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 43m 17s | Avg: 43m 17s | Max: 43m 17s | Hits:  42%/3164  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/45  | Total:  5h 32m | Avg:  7m 22s | Max: 29m 06s
      🟩 GCC                Pass: 100%/52  | Total:  7h 31m | Avg:  8m 41s | Max: 42m 22s
      🟩 Intel              Pass: 100%/3   | Total: 42m 31s | Avg: 14m 10s | Max: 30m 39s
      🟩 MSVC               Pass: 100%/4   | Total:  2h 10m | Avg: 32m 32s | Max: 43m 17s | Hits:  57%/11736 
    🟩 gpu
      🟩 v100               Pass: 100%/104 | Total: 15h 56m | Avg:  9m 11s | Max: 43m 17s | Hits:  57%/11736 
    🟩 jobs
      🟩 Build              Pass: 100%/96  | Total: 12h 45m | Avg:  7m 58s | Max: 43m 17s | Hits:  57%/11736 
      🟩 NVRTC              Pass: 100%/4   | Total:  1h 36m | Avg: 24m 10s | Max: 27m 04s
      🟩 Test               Pass: 100%/3   | Total:  1h 31m | Avg: 30m 36s | Max: 42m 22s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 26s | Avg:  2m 26s | Max:  2m 26s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 24m 45s | Avg:  8m 15s | Max:  8m 51s
      🟩 90a                Pass: 100%/4   | Total: 16m 42s | Avg:  4m 10s | Max:  4m 40s
    🟩 std
      🟩 11                 Pass: 100%/28  | Total:  3h 13m | Avg:  6m 54s | Max: 25m 40s
      🟩 14                 Pass: 100%/28  | Total:  3h 59m | Avg:  8m 32s | Max: 34m 32s | Hits:  74%/5576  
      🟩 17                 Pass: 100%/27  | Total:  4h 37m | Avg: 10m 16s | Max: 38m 38s | Hits:  43%/2996  
      🟩 20                 Pass: 100%/20  | Total:  4h 04m | Avg: 12m 12s | Max: 43m 17s | Hits:  42%/3164  
    
  • 🟩 thrust: Pass: 100%/103 | Total: 1d 23h | Avg: 27m 50s | Max: 1h 10m | Hits: 68%/13095

    🟩 cpu
      🟩 amd64              Pass: 100%/95  | Total:  1d 20h | Avg: 28m 03s | Max:  1h 10m | Hits:  68%/13095 
      🟩 arm64              Pass: 100%/8   | Total:  3h 22m | Avg: 25m 15s | Max: 35m 17s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  6h 27m | Avg: 25m 48s | Max:  1h 06m | Hits:  61%/2619  
      🟩 11.8               Pass: 100%/3   | Total:  1h 37m | Avg: 32m 24s | Max: 50m 04s
      🟩 12.6               Pass: 100%/85  | Total:  1d 15h | Avg: 28m 01s | Max:  1h 10m | Hits:  70%/10476 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 55m 12s | Avg: 27m 36s | Max: 28m 48s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  6h 27m | Avg: 25m 48s | Max:  1h 06m | Hits:  61%/2619  
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 37m | Avg: 32m 24s | Max: 50m 04s
      🟩 nvcc12.6           Pass: 100%/83  | Total:  1d 14h | Avg: 28m 02s | Max:  1h 10m | Hits:  70%/10476 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 55m 12s | Avg: 27m 36s | Max: 28m 48s
      🟩 nvcc               Pass: 100%/101 | Total:  1d 22h | Avg: 27m 50s | Max:  1h 10m | Hits:  68%/13095 
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  3h 02m | Avg: 30m 27s | Max: 33m 55s
      🟩 Clang10            Pass: 100%/3   | Total:  1h 35m | Avg: 31m 45s | Max: 34m 43s
      🟩 Clang11            Pass: 100%/4   | Total:  2h 05m | Avg: 31m 19s | Max: 34m 19s
      🟩 Clang12            Pass: 100%/4   | Total:  2h 06m | Avg: 31m 36s | Max: 35m 17s
      🟩 Clang13            Pass: 100%/4   | Total:  2h 02m | Avg: 30m 42s | Max: 33m 41s
      🟩 Clang14            Pass: 100%/4   | Total:  1h 44m | Avg: 26m 01s | Max: 34m 18s
      🟩 Clang15            Pass: 100%/4   | Total:  1h 43m | Avg: 25m 54s | Max: 34m 20s
      🟩 Clang16            Pass: 100%/4   | Total:  1h 44m | Avg: 26m 06s | Max: 34m 11s
      🟩 Clang17            Pass: 100%/4   | Total:  1h 46m | Avg: 26m 33s | Max: 36m 14s
      🟩 Clang18            Pass: 100%/9   | Total:  3h 29m | Avg: 23m 16s | Max: 38m 43s
      🟩 GCC6               Pass: 100%/2   | Total: 34m 48s | Avg: 17m 24s | Max: 30m 18s
      🟩 GCC7               Pass: 100%/6   | Total:  2h 21m | Avg: 23m 33s | Max: 39m 06s
      🟩 GCC8               Pass: 100%/6   | Total:  2h 24m | Avg: 24m 05s | Max: 38m 19s
      🟩 GCC9               Pass: 100%/6   | Total:  2h 22m | Avg: 23m 44s | Max: 34m 53s
      🟩 GCC10              Pass: 100%/4   | Total:  1h 47m | Avg: 26m 54s | Max: 35m 23s
      🟩 GCC11              Pass: 100%/7   | Total:  3h 29m | Avg: 29m 59s | Max: 50m 04s
      🟩 GCC12              Pass: 100%/4   | Total:  1h 57m | Avg: 29m 16s | Max: 41m 05s
      🟩 GCC13              Pass: 100%/14  | Total:  4h 42m | Avg: 20m 10s | Max: 39m 06s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 06m | Avg: 42m 06s | Max: 50m 57s
      🟩 MSVC14.16          Pass: 100%/1   | Total:  1h 06m | Avg:  1h 06m | Max:  1h 06m | Hits:  61%/2619  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 58m | Avg: 59m 03s | Max:  1h 00m | Hits:  61%/5238  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 36m | Avg: 48m 00s | Max:  1h 10m | Hits:  80%/5238  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/46  | Total: 21h 20m | Avg: 27m 50s | Max: 38m 43s
      🟩 GCC                Pass: 100%/49  | Total: 19h 40m | Avg: 24m 05s | Max: 50m 04s
      🟩 Intel              Pass: 100%/3   | Total:  2h 06m | Avg: 42m 06s | Max: 50m 57s
      🟩 MSVC               Pass: 100%/5   | Total:  4h 40m | Avg: 56m 04s | Max:  1h 10m | Hits:  68%/13095 
    🟩 gpu
      🟩 v100               Pass: 100%/103 | Total:  1d 23h | Avg: 27m 50s | Max:  1h 10m | Hits:  68%/13095 
    🟩 jobs
      🟩 Build              Pass: 100%/96  | Total:  1d 22h | Avg: 28m 48s | Max:  1h 10m | Hits:  61%/10476 
      🟩 TestCPU            Pass: 100%/4   | Total: 49m 06s | Avg: 12m 16s | Max: 25m 04s | Hits:  99%/2619  
      🟩 TestGPU            Pass: 100%/3   | Total: 53m 10s | Avg: 17m 43s | Max: 29m 29s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 37m | Avg: 32m 24s | Max: 50m 04s
      🟩 90a                Pass: 100%/4   | Total:  1h 13m | Avg: 18m 29s | Max: 20m 04s
    🟩 std
      🟩 11                 Pass: 100%/28  | Total:  5h 36m | Avg: 12m 01s | Max: 33m 48s
      🟩 14                 Pass: 100%/27  | Total: 15h 27m | Avg: 34m 21s | Max:  1h 06m | Hits:  61%/5238  
      🟩 17                 Pass: 100%/26  | Total: 15h 33m | Avg: 35m 53s | Max:  1h 00m | Hits:  61%/2619  
      🟩 20                 Pass: 100%/22  | Total: 11h 09m | Avg: 30m 26s | Max:  1h 10m | Hits:  80%/5238  
    
  • 🟩 cudax: Pass: 100%/52 | Total: 2h 59m | Avg: 3m 26s | Max: 11m 25s | Hits: 89%/216

    🟩 cpu
      🟩 amd64              Pass: 100%/48  | Total:  2h 47m | Avg:  3m 29s | Max: 11m 25s | Hits:  89%/216   
      🟩 arm64              Pass: 100%/4   | Total: 11m 51s | Avg:  2m 57s | Max:  3m 25s
    🟩 ctk
      🟩 12.0               Pass: 100%/19  | Total:  1h 02m | Avg:  3m 16s | Max: 10m 43s | Hits:  89%/108   
      🟩 12.6               Pass: 100%/33  | Total:  1h 56m | Avg:  3m 32s | Max: 11m 25s | Hits:  89%/108   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/19  | Total:  1h 02m | Avg:  3m 16s | Max: 10m 43s | Hits:  89%/108   
      🟩 nvcc12.6           Pass: 100%/33  | Total:  1h 56m | Avg:  3m 32s | Max: 11m 25s | Hits:  89%/108   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/52  | Total:  2h 59m | Avg:  3m 26s | Max: 11m 25s | Hits:  89%/216   
    🟩 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  6m 20s | Avg:  3m 10s | Max:  3m 32s
      🟩 Clang10            Pass: 100%/2   | Total:  6m 13s | Avg:  3m 06s | Max:  3m 36s
      🟩 Clang11            Pass: 100%/4   | Total: 12m 26s | Avg:  3m 06s | Max:  3m 22s
      🟩 Clang12            Pass: 100%/4   | Total: 11m 57s | Avg:  2m 59s | Max:  3m 27s
      🟩 Clang13            Pass: 100%/4   | Total: 11m 29s | Avg:  2m 52s | Max:  3m 03s
      🟩 Clang14            Pass: 100%/4   | Total: 14m 27s | Avg:  3m 36s | Max:  4m 30s
      🟩 Clang15            Pass: 100%/2   | Total:  6m 43s | Avg:  3m 21s | Max:  3m 26s
      🟩 Clang16            Pass: 100%/4   | Total: 12m 37s | Avg:  3m 09s | Max:  3m 25s
      🟩 Clang17            Pass: 100%/2   | Total:  6m 23s | Avg:  3m 11s | Max:  3m 16s
      🟩 Clang18            Pass: 100%/2   | Total:  7m 24s | Avg:  3m 42s | Max:  4m 42s
      🟩 GCC9               Pass: 100%/2   | Total:  5m 20s | Avg:  2m 40s | Max:  2m 50s
      🟩 GCC10              Pass: 100%/4   | Total: 11m 28s | Avg:  2m 52s | Max:  3m 34s
      🟩 GCC11              Pass: 100%/4   | Total: 11m 25s | Avg:  2m 51s | Max:  3m 33s
      🟩 GCC12              Pass: 100%/7   | Total: 24m 05s | Avg:  3m 26s | Max:  4m 34s
      🟩 GCC13              Pass: 100%/3   | Total:  8m 51s | Avg:  2m 57s | Max:  3m 03s
      🟩 MSVC14.36          Pass: 100%/1   | Total: 10m 43s | Avg: 10m 43s | Max: 10m 43s | Hits:  89%/108   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 11m 25s | Avg: 11m 25s | Max: 11m 25s | Hits:  89%/108   
    🟩 cxx_family
      🟩 Clang              Pass: 100%/30  | Total:  1h 35m | Avg:  3m 11s | Max:  4m 42s
      🟩 GCC                Pass: 100%/20  | Total:  1h 01m | Avg:  3m 03s | Max:  4m 34s
      🟩 MSVC               Pass: 100%/2   | Total: 22m 08s | Avg: 11m 04s | Max: 11m 25s | Hits:  89%/216   
    🟩 gpu
      🟩 v100               Pass: 100%/52  | Total:  2h 59m | Avg:  3m 26s | Max: 11m 25s | Hits:  89%/216   
    🟩 jobs
      🟩 Build              Pass: 100%/47  | Total:  2h 37m | Avg:  3m 21s | Max: 11m 25s | Hits:  89%/216   
      🟩 Test               Pass: 100%/5   | Total: 21m 46s | Avg:  4m 21s | Max:  4m 42s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 30s | Avg:  2m 30s | Max:  2m 30s
      🟩 90a                Pass: 100%/1   | Total:  3m 03s | Avg:  3m 03s | Max:  3m 03s
    🟩 std
      🟩 17                 Pass: 100%/28  | Total:  1h 25m | Avg:  3m 02s | Max:  4m 31s
      🟩 20                 Pass: 100%/24  | Total:  1h 34m | Avg:  3m 55s | Max: 11m 25s | Hits:  89%/216   
    
  • 🟩 cccl: Pass: 100%/6 | Total: 31m 48s | Avg: 5m 18s | Max: 5m 55s

    🟩 cpu
      🟩 amd64              Pass: 100%/6   | Total: 31m 48s | Avg:  5m 18s | Max:  5m 55s
    🟩 ctk
      🟩 11.1               Pass: 100%/2   | Total:  8m 37s | Avg:  4m 18s | Max:  4m 39s
      🟩 12.0               Pass: 100%/2   | Total: 11m 27s | Avg:  5m 43s | Max:  5m 53s
      🟩 12.6               Pass: 100%/2   | Total: 11m 44s | Avg:  5m 52s | Max:  5m 55s
    🟩 cudacxx
      🟩 nvcc11.1           Pass: 100%/2   | Total:  8m 37s | Avg:  4m 18s | Max:  4m 39s
      🟩 nvcc12.0           Pass: 100%/2   | Total: 11m 27s | Avg:  5m 43s | Max:  5m 53s
      🟩 nvcc12.6           Pass: 100%/2   | Total: 11m 44s | Avg:  5m 52s | Max:  5m 55s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/6   | Total: 31m 48s | Avg:  5m 18s | Max:  5m 55s
    🟩 cxx
      🟩 Clang9             Pass: 100%/1   | Total:  4m 39s | Avg:  4m 39s | Max:  4m 39s
      🟩 Clang14            Pass: 100%/1   | Total:  5m 53s | Avg:  5m 53s | Max:  5m 53s
      🟩 Clang18            Pass: 100%/1   | Total:  5m 49s | Avg:  5m 49s | Max:  5m 49s
      🟩 GCC6               Pass: 100%/1   | Total:  3m 58s | Avg:  3m 58s | Max:  3m 58s
      🟩 GCC12              Pass: 100%/1   | Total:  5m 34s | Avg:  5m 34s | Max:  5m 34s
      🟩 GCC13              Pass: 100%/1   | Total:  5m 55s | Avg:  5m 55s | Max:  5m 55s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/3   | Total: 16m 21s | Avg:  5m 27s | Max:  5m 53s
      🟩 GCC                Pass: 100%/3   | Total: 15m 27s | Avg:  5m 09s | Max:  5m 55s
    🟩 gpu
      🟩 v100               Pass: 100%/6   | Total: 31m 48s | Avg:  5m 18s | Max:  5m 55s
    🟩 jobs
      🟩 Infra              Pass: 100%/6   | Total: 31m 48s | Avg:  5m 18s | Max:  5m 55s
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 11m 25s | Avg: 5m 42s | Max: 8m 51s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 11m 25s | Avg:  5m 42s | Max:  8m 51s
    🟩 ctk
      🟩 12.5               Pass: 100%/2   | Total: 11m 25s | Avg:  5m 42s | Max:  8m 51s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/2   | Total: 11m 25s | Avg:  5m 42s | Max:  8m 51s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 11m 25s | Avg:  5m 42s | Max:  8m 51s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 11m 25s | Avg:  5m 42s | Max:  8m 51s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 11m 25s | Avg:  5m 42s | Max:  8m 51s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total: 11m 25s | Avg:  5m 42s | Max:  8m 51s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 34s | Avg:  2m 34s | Max:  2m 34s
      🟩 Test               Pass: 100%/1   | Total:  8m 51s | Avg:  8m 51s | Max:  8m 51s
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 16m 29s | Avg: 16m 29s | Max: 16m 29s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 16m 29s | Avg: 16m 29s | Max: 16m 29s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 16m 29s | Avg: 16m 29s | Max: 16m 29s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 16m 29s | Avg: 16m 29s | Max: 16m 29s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 16m 29s | Avg: 16m 29s | Max: 16m 29s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 16m 29s | Avg: 16m 29s | Max: 16m 29s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 16m 29s | Avg: 16m 29s | Max: 16m 29s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 16m 29s | Avg: 16m 29s | Max: 16m 29s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 16m 29s | Avg: 16m 29s | Max: 16m 29s
    

👃 Inspect Changes

Modifications in project?

Project
+/- CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
pycuda
+/- CCCL C Parallel Library

Modifications in project or dependencies?

Project
+/- CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- pycuda
+/- CCCL C Parallel Library

🏃‍ Runner counts (total jobs: 372)

# Runner
298 linux-amd64-cpu16
31 linux-amd64-gpu-v100-latest-1
28 linux-arm64-cpu16
15 windows-amd64-cpu16

@griwes griwes enabled auto-merge (squash) October 23, 2024 01:27
@wmaxey
Copy link
Member

wmaxey commented Oct 23, 2024

I think that this looks good. I believe I understand how you've inverted the control of the launch path to be under CUB. I will admit I'm only familiar with it from the cccl/c side.

@griwes griwes merged commit cb5bbec into NVIDIA:main Oct 23, 2024
387 checks passed
pciolkosz pushed a commit to pciolkosz/cccl that referenced this pull request Oct 25, 2024
fbusato pushed a commit to fbusato/cccl that referenced this pull request Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[FEA]: Refactor C++/C dispatch layers to avoid code duplication
2 participants