Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUB - Enable DPX Reduction #2286

Merged
merged 40 commits into from
Sep 6, 2024
Merged

CUB - Enable DPX Reduction #2286

merged 40 commits into from
Sep 6, 2024

Conversation

fbusato
Copy link
Contributor

@fbusato fbusato commented Aug 23, 2024

Address #2032

Enable DPX SIMD comparison (min/max) instructions for Hopper+ architectures for uint16_t and int16_t data types

Additional optimizations, not strictly relatated to DPX, and overloading clean up will be part of another issue/PR

NVIDIA H100 80GB HBM3

T{ct} OffsetT{ct} Elements{io} Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff
I16 I32 2^16 7.396 us 2.12% 7.096 us 2.93% -0.300 us -4.05%
I16 I32 2^20 9.194 us 2.56% 8.810 us 1.93% -0.384 us -4.18%
I16 I32 2^24 23.741 us 1.58% 23.163 us 1.51% -0.579 us -2.44%
I16 I32 2^28 190.121 us 2.02% 186.033 us 2.06% -4.088 us -2.15%
I16 I64 2^16 8.177 us 2.82% 7.897 us 2.43% -0.280 us -3.42%
I16 I64 2^20 9.321 us 2.41% 8.994 us 2.65% -0.326 us -3.50%
I16 I64 2^24 23.962 us 1.51% 23.315 us 1.72% -0.646 us -2.70%
I16 I64 2^28 188.797 us 2.01% 186.692 us 2.09% -2.105 us -1.12%

NVIDIA H200

T{ct} OffsetT{ct} Elements{io} Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff
I16 I32 2^16 6.755 us 2.35% 6.659 us 2.13% -0.096 us -1.42%
I16 I32 2^20 8.369 us 2.15% 8.222 us 2.67% -0.147 us -1.75%
I16 I32 2^24 20.177 us 1.95% 18.575 us 1.94% -1.602 us -7.94%
I16 I32 2^28 140.403 us 1.78% 135.501 us 1.81% -4.903 us -3.49%
I16 I64 2^16 7.545 us 1.97% 7.478 us 2.24% -0.066 us -0.88%
I16 I64 2^20 8.270 us 2.46% 8.230 us 2.32% -0.040 us -0.48%
I16 I64 2^24 20.515 us 1.85% 18.961 us 2.14% -1.554 us -7.57%
I16 I64 2^28 138.206 us 1.71% 135.782 us 1.71% -2.424 us -1.75%

Copy link

copy-pr-bot bot commented Aug 23, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

cub/cub/detail/type_traits.cuh Outdated Show resolved Hide resolved
cub/cub/detail/type_traits.cuh Outdated Show resolved Hide resolved
cub/cub/thread/thread_operators.cuh Outdated Show resolved Hide resolved
cub/cub/thread/thread_reduce.cuh Outdated Show resolved Hide resolved
cub/cub/thread/thread_reduce.cuh Outdated Show resolved Hide resolved
cub/cub/thread/thread_reduce.cuh Outdated Show resolved Hide resolved
cub/cub/thread/thread_reduce.cuh Outdated Show resolved Hide resolved
cub/cub/thread/thread_reduce.cuh Outdated Show resolved Hide resolved
cub/test/catch2_test_device_reduce.cu Outdated Show resolved Hide resolved
cub/test/catch2_test_device_segmented_sort_pairs.cu Outdated Show resolved Hide resolved
@fbusato fbusato requested a review from a team as a code owner August 26, 2024 19:19
@fbusato
Copy link
Contributor Author

fbusato commented Aug 27, 2024

thanks @bernhardmgruber @miscco @mfbalin for all your suggestions!
The current code address all of them.
Please also take a look at the performance results in the PR description.

Copy link
Contributor

@bernhardmgruber bernhardmgruber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, LGTM. Here are some more suggestions:

cub/cub/detail/type_traits.cuh Outdated Show resolved Hide resolved
cub/cub/thread/thread_reduce.cuh Outdated Show resolved Hide resolved
cub/cub/thread/thread_reduce.cuh Outdated Show resolved Hide resolved
cub/cub/thread/thread_reduce.cuh Outdated Show resolved Hide resolved
cub/cub/thread/thread_reduce.cuh Outdated Show resolved Hide resolved
@gevtushenko
Copy link
Collaborator

/ok to test

@fbusato
Copy link
Contributor Author

fbusato commented Sep 4, 2024

/ok to test

@fbusato
Copy link
Contributor Author

fbusato commented Sep 4, 2024

@gevtushenko @bernhardmgruber @miscco I practically rewrote thread_reduce.cuh for C++11 compliance and added several improvements. Could you please kindly review the code in thread/thread_reduce.cuh and detail/type_traits.cuh?

Copy link
Contributor

@bernhardmgruber bernhardmgruber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is some feedback:

Comment on lines +55 to +56
#define _CUB_TEMPLATE_REQUIRES(...) ::cuda::std::__enable_if_t<(__VA_ARGS__)>* = nullptr

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: This feels generic enough to be part of libcu++ and be named _CCCL_TEMPLATE_REQUIRES. @miscco maybe this is already covered by your concept emulation?

cub/cub/detail/type_traits.cuh Outdated Show resolved Hide resolved
cub/cub/detail/type_traits.cuh Outdated Show resolved Hide resolved
cub/cub/detail/type_traits.cuh Show resolved Hide resolved
cub/cub/detail/type_traits.cuh Outdated Show resolved Hide resolved
cub/cub/thread/thread_reduce.cuh Outdated Show resolved Hide resolved
cub/cub/thread/thread_reduce.cuh Outdated Show resolved Hide resolved
cub/cub/thread/thread_reduce.cuh Outdated Show resolved Hide resolved
cub/cub/thread/thread_reduce.cuh Outdated Show resolved Hide resolved
cub/cub/thread/thread_reduce.cuh Outdated Show resolved Hide resolved
@fbusato
Copy link
Contributor Author

fbusato commented Sep 4, 2024

/ok to test

@fbusato
Copy link
Contributor Author

fbusato commented Sep 4, 2024

/ok to test

@fbusato
Copy link
Contributor Author

fbusato commented Sep 4, 2024

/ok to test

Copy link
Contributor

github-actions bot commented Sep 5, 2024

🟨 CI finished in 4h 44m: Pass: 83%/251 | Total: 6d 07h | Avg: 36m 16s | Max: 1h 12m | Hits: 69%/24387
  • 🟨 cub: Pass: 75%/132 | Total: 3d 21h | Avg: 42m 27s | Max: 1h 12m | Hits: 33%/4308

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  73%/124 | Total:  3d 14h | Avg: 41m 44s | Max:  1h 12m | Hits:  33%/4308  
      🟩 arm64              Pass: 100%/8   | Total:  7h 08m | Avg: 53m 33s | Max: 55m 18s
    🔍 ctk: 12.5 🔍
      🟩 11.1               Pass: 100%/15  | Total: 11h 17m | Avg: 45m 10s | Max: 58m 35s | Hits:  33%/718   
      🟩 11.8               Pass: 100%/3   | Total:  3h 28m | Avg:  1h 09m | Max:  1h 11m
      🔍 12.5               Pass:  71%/114 | Total:  3d 06h | Avg: 41m 23s | Max:  1h 12m | Hits:  33%/3590  
    🔍 cudacxx: nvcc12.5 🔍
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 49m 23s | Avg: 24m 41s | Max: 25m 45s
      🟩 nvcc11.1           Pass: 100%/15  | Total: 11h 17m | Avg: 45m 10s | Max: 58m 35s | Hits:  33%/718   
      🟩 nvcc11.8           Pass: 100%/3   | Total:  3h 28m | Avg:  1h 09m | Max:  1h 11m
      🔍 nvcc12.5           Pass:  70%/112 | Total:  3d 05h | Avg: 41m 41s | Max:  1h 12m | Hits:  33%/3590  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 49m 23s | Avg: 24m 41s | Max: 25m 45s
      🔍 nvcc               Pass:  74%/130 | Total:  3d 20h | Avg: 42m 43s | Max:  1h 12m | Hits:  33%/4308  
    🟨 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  4h 44m | Avg: 47m 20s | Max: 53m 19s
      🟩 Clang10            Pass: 100%/3   | Total:  2h 42m | Avg: 54m 15s | Max: 56m 45s
      🟩 Clang11            Pass: 100%/4   | Total:  3h 40m | Avg: 55m 03s | Max: 56m 29s
      🟩 Clang12            Pass: 100%/4   | Total:  3h 30m | Avg: 52m 33s | Max: 56m 18s
      🟩 Clang13            Pass: 100%/4   | Total:  3h 32m | Avg: 53m 07s | Max: 55m 39s
      🟩 Clang14            Pass: 100%/4   | Total:  3h 30m | Avg: 52m 41s | Max: 55m 22s
      🟩 Clang15            Pass: 100%/4   | Total:  3h 33m | Avg: 53m 22s | Max: 55m 27s
      🟩 Clang16            Pass: 100%/4   | Total:  3h 46m | Avg: 56m 38s | Max: 57m 48s
      🟨 Clang17            Pass:  38%/26  | Total: 11h 27m | Avg: 26m 25s | Max: 54m 18s
      🟩 GCC6               Pass: 100%/2   | Total:  1h 26m | Avg: 43m 00s | Max: 44m 13s
      🟩 GCC7               Pass: 100%/6   | Total:  4h 43m | Avg: 47m 16s | Max: 50m 49s
      🟩 GCC8               Pass: 100%/6   | Total:  4h 52m | Avg: 48m 45s | Max: 56m 58s
      🟩 GCC9               Pass: 100%/6   | Total:  4h 51m | Avg: 48m 31s | Max: 53m 27s
      🟩 GCC10              Pass: 100%/4   | Total:  3h 26m | Avg: 51m 37s | Max: 55m 40s
      🟩 GCC11              Pass: 100%/7   | Total:  7h 08m | Avg:  1h 01m | Max:  1h 11m
      🟩 GCC12              Pass: 100%/4   | Total:  3h 42m | Avg: 55m 43s | Max: 57m 36s
      🟨 GCC13              Pass:  41%/29  | Total: 13h 15m | Avg: 27m 26s | Max: 55m 18s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 52m | Avg: 57m 21s | Max: 58m 40s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 58m 35s | Avg: 58m 35s | Max: 58m 35s | Hits:  33%/718   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 12m | Hits:  33%/1436  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  3h 22m | Avg:  1h 07m | Max:  1h 09m | Hits:  33%/2154  
    🟨 cxx_family
      🟨 Clang              Pass:  72%/59  | Total:  1d 16h | Avg: 41m 08s | Max: 57m 48s
      🟨 GCC                Pass:  73%/64  | Total:  1d 19h | Avg: 40m 43s | Max:  1h 11m
      🟩 Intel              Pass: 100%/3   | Total:  2h 52m | Avg: 57m 21s | Max: 58m 40s
      🟩 MSVC               Pass: 100%/6   | Total:  6h 38m | Avg:  1h 06m | Max:  1h 12m | Hits:  33%/4308  
    🟨 jobs
      🟩 Build              Pass: 100%/99  | Total:  3d 13h | Avg: 51m 44s | Max:  1h 12m | Hits:  33%/4308  
      🟥 DeviceLaunch       Pass:   0%/8   | Total:  1h 32m | Avg: 11m 33s | Max: 14m 06s
      🟥 GraphCapture       Pass:   0%/8   | Total:  1h 22m | Avg: 10m 16s | Max: 11m 51s
      🟥 HostLaunch         Pass:   0%/8   | Total:  1h 33m | Avg: 11m 43s | Max: 14m 05s
      🟥 SmallGMem          Pass:   0%/1   | Total: 28m 20s | Avg: 28m 20s | Max: 28m 20s
      🟥 TestGPU            Pass:   0%/8   | Total:  3h 04m | Avg: 23m 04s | Max: 31m 07s
    🟨 gpu
      🟨 v100               Pass:  75%/132 | Total:  3d 21h | Avg: 42m 27s | Max:  1h 12m | Hits:  33%/4308  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  3h 28m | Avg:  1h 09m | Max:  1h 11m
      🟩 90a                Pass: 100%/4   | Total:  1h 40m | Avg: 25m 10s | Max: 25m 56s
    🟨 std
      🟨 11                 Pass:  76%/34  | Total:  1d 00h | Avg: 42m 25s | Max:  1h 07m
      🟨 14                 Pass:  78%/37  | Total:  1d 03h | Avg: 44m 18s | Max:  1h 11m | Hits:  33%/2154  
      🟨 17                 Pass:  75%/37  | Total:  1d 02h | Avg: 43m 04s | Max:  1h 12m | Hits:  33%/1436  
      🟨 20                 Pass:  66%/24  | Total: 15h 28m | Avg: 38m 42s | Max:  1h 09m | Hits:  33%/718   
    
  • 🟨 thrust: Pass: 93%/118 | Total: 2d 10h | Avg: 29m 33s | Max: 1h 09m | Hits: 76%/20079

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  92%/110 | Total:  2d 06h | Avg: 29m 40s | Max:  1h 09m | Hits:  76%/20079 
      🟩 arm64              Pass: 100%/8   | Total:  3h 44m | Avg: 28m 07s | Max: 31m 45s
    🔍 ctk: 12.5 🔍
      🟩 11.1               Pass: 100%/15  | Total:  7h 19m | Avg: 29m 19s | Max: 54m 03s | Hits:  65%/2231  
      🟩 11.8               Pass: 100%/3   | Total:  2h 04m | Avg: 41m 37s | Max: 48m 02s
      🔍 12.5               Pass:  92%/100 | Total:  2d 00h | Avg: 29m 14s | Max:  1h 09m | Hits:  78%/17848 
    🔍 cudacxx: nvcc12.5 🔍
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 59m 29s | Avg: 29m 44s | Max: 30m 43s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  7h 19m | Avg: 29m 19s | Max: 54m 03s | Hits:  65%/2231  
      🟩 nvcc11.8           Pass: 100%/3   | Total:  2h 04m | Avg: 41m 37s | Max: 48m 02s
      🔍 nvcc12.5           Pass:  91%/98  | Total:  1d 23h | Avg: 29m 13s | Max:  1h 09m | Hits:  78%/17848 
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 59m 29s | Avg: 29m 44s | Max: 30m 43s
      🔍 nvcc               Pass:  93%/116 | Total:  2d 09h | Avg: 29m 33s | Max:  1h 09m | Hits:  76%/20079 
    🚨 jobs: TestGPU 🚨
      🟩 Build              Pass: 100%/99  | Total:  2d 06h | Avg: 32m 45s | Max:  1h 09m | Hits:  65%/13386 
      🟩 TestCPU            Pass: 100%/11  | Total:  2h 11m | Avg: 11m 55s | Max: 24m 55s | Hits:  99%/6693  
      🔥 TestGPU            Pass:   0%/8   | Total:  1h 54m | Avg: 14m 19s | Max: 17m 57s
    🟨 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  2h 54m | Avg: 29m 08s | Max: 36m 25s
      🟩 Clang10            Pass: 100%/3   | Total:  1h 38m | Avg: 32m 52s | Max: 38m 09s
      🟩 Clang11            Pass: 100%/4   | Total:  2h 03m | Avg: 30m 57s | Max: 34m 34s
      🟩 Clang12            Pass: 100%/4   | Total:  2h 02m | Avg: 30m 41s | Max: 34m 13s
      🟩 Clang13            Pass: 100%/4   | Total:  1h 59m | Avg: 29m 49s | Max: 33m 07s
      🟩 Clang14            Pass: 100%/4   | Total:  2h 05m | Avg: 31m 16s | Max: 34m 37s
      🟩 Clang15            Pass: 100%/4   | Total:  1h 58m | Avg: 29m 39s | Max: 33m 27s
      🟩 Clang16            Pass: 100%/4   | Total:  2h 08m | Avg: 32m 08s | Max: 33m 48s
      🟨 Clang17            Pass:  77%/18  | Total:  6h 23m | Avg: 21m 18s | Max: 36m 16s
      🟩 GCC6               Pass: 100%/2   | Total: 54m 25s | Avg: 27m 12s | Max: 30m 39s
      🟩 GCC7               Pass: 100%/6   | Total:  2h 49m | Avg: 28m 19s | Max: 33m 17s
      🟩 GCC8               Pass: 100%/6   | Total:  2h 58m | Avg: 29m 48s | Max: 35m 07s
      🟩 GCC9               Pass: 100%/6   | Total:  3h 05m | Avg: 30m 53s | Max: 35m 05s
      🟩 GCC10              Pass: 100%/4   | Total:  2h 15m | Avg: 33m 49s | Max: 36m 22s
      🟩 GCC11              Pass: 100%/7   | Total:  4h 17m | Avg: 36m 45s | Max: 48m 02s
      🟩 GCC12              Pass: 100%/4   | Total:  2h 14m | Avg: 33m 41s | Max: 38m 34s
      🟨 GCC13              Pass:  80%/20  | Total:  6h 57m | Avg: 20m 52s | Max: 40m 07s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 52m | Avg: 37m 33s | Max: 45m 01s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 54m 03s | Avg: 54m 03s | Max: 54m 03s | Hits:  65%/2231  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 04m | Hits:  65%/4462  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  4h 28m | Avg: 44m 40s | Max:  1h 09m | Hits:  82%/13386 
    🟨 cxx_family
      🟨 Clang              Pass:  92%/51  | Total: 23h 15m | Avg: 27m 21s | Max: 38m 09s
      🟨 GCC                Pass:  92%/55  | Total:  1d 01h | Avg: 27m 52s | Max: 48m 02s
      🟩 Intel              Pass: 100%/3   | Total:  1h 52m | Avg: 37m 33s | Max: 45m 01s
      🟩 MSVC               Pass: 100%/9   | Total:  7h 27m | Avg: 49m 44s | Max:  1h 09m | Hits:  76%/20079 
    🟨 gpu
      🟨 v100               Pass:  93%/118 | Total:  2d 10h | Avg: 29m 33s | Max:  1h 09m | Hits:  76%/20079 
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  2h 04m | Avg: 41m 37s | Max: 48m 02s
      🟩 90a                Pass: 100%/4   | Total:  1h 16m | Avg: 19m 02s | Max: 21m 13s
    🟨 std
      🟨 11                 Pass:  93%/30  | Total: 11h 49m | Avg: 23m 39s | Max: 34m 00s
      🟨 14                 Pass:  94%/34  | Total: 17h 51m | Avg: 31m 31s | Max:  1h 02m | Hits:  74%/8924  
      🟨 17                 Pass:  93%/33  | Total: 17h 52m | Avg: 32m 30s | Max:  1h 04m | Hits:  76%/6693  
      🟨 20                 Pass:  90%/21  | Total: 10h 34m | Avg: 30m 13s | Max:  1h 09m | Hits:  82%/4462  
    
  • 🟥 pycuda: Pass: 0%/1 | Total: 13m 58s | Avg: 13m 58s | Max: 13m 58s

    🟥 cpu
      🟥 amd64              Pass:   0%/1   | Total: 13m 58s | Avg: 13m 58s | Max: 13m 58s
    🟥 ctk
      🟥 12.5               Pass:   0%/1   | Total: 13m 58s | Avg: 13m 58s | Max: 13m 58s
    🟥 cudacxx
      🟥 nvcc12.5           Pass:   0%/1   | Total: 13m 58s | Avg: 13m 58s | Max: 13m 58s
    🟥 cudacxx_family
      🟥 nvcc               Pass:   0%/1   | Total: 13m 58s | Avg: 13m 58s | Max: 13m 58s
    🟥 cxx
      🟥 GCC13              Pass:   0%/1   | Total: 13m 58s | Avg: 13m 58s | Max: 13m 58s
    🟥 cxx_family
      🟥 GCC                Pass:   0%/1   | Total: 13m 58s | Avg: 13m 58s | Max: 13m 58s
    🟥 gpu
      🟥 v100               Pass:   0%/1   | Total: 13m 58s | Avg: 13m 58s | Max: 13m 58s
    🟥 jobs
      🟥 Test               Pass:   0%/1   | Total: 13m 58s | Avg: 13m 58s | Max: 13m 58s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
pycuda
CUDA C Core Library

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda
+/- CUDA C Core Library

🏃‍ Runner counts (total jobs: 251)

# Runner
178 linux-amd64-cpu16
42 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

Copy link
Contributor

github-actions bot commented Sep 5, 2024

🟨 CI finished in 7h 14m: Pass: 83%/251 | Total: 6d 08h | Avg: 36m 26s | Max: 1h 12m | Hits: 69%/24387
  • 🟨 cub: Pass: 75%/132 | Total: 3d 21h | Avg: 42m 34s | Max: 1h 12m | Hits: 33%/4308

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  73%/124 | Total:  3d 14h | Avg: 41m 52s | Max:  1h 12m | Hits:  33%/4308  
      🟩 arm64              Pass: 100%/8   | Total:  7h 08m | Avg: 53m 33s | Max: 55m 18s
    🔍 ctk: 12.5 🔍
      🟩 11.1               Pass: 100%/15  | Total: 11h 17m | Avg: 45m 10s | Max: 58m 35s | Hits:  33%/718   
      🟩 11.8               Pass: 100%/3   | Total:  3h 28m | Avg:  1h 09m | Max:  1h 11m
      🔍 12.5               Pass:  71%/114 | Total:  3d 06h | Avg: 41m 31s | Max:  1h 12m | Hits:  33%/3590  
    🔍 cudacxx: nvcc12.5 🔍
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 49m 23s | Avg: 24m 41s | Max: 25m 45s
      🟩 nvcc11.1           Pass: 100%/15  | Total: 11h 17m | Avg: 45m 10s | Max: 58m 35s | Hits:  33%/718   
      🟩 nvcc11.8           Pass: 100%/3   | Total:  3h 28m | Avg:  1h 09m | Max:  1h 11m
      🔍 nvcc12.5           Pass:  70%/112 | Total:  3d 06h | Avg: 41m 49s | Max:  1h 12m | Hits:  33%/3590  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 49m 23s | Avg: 24m 41s | Max: 25m 45s
      🔍 nvcc               Pass:  74%/130 | Total:  3d 20h | Avg: 42m 51s | Max:  1h 12m | Hits:  33%/4308  
    🟨 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  4h 44m | Avg: 47m 20s | Max: 53m 19s
      🟩 Clang10            Pass: 100%/3   | Total:  2h 42m | Avg: 54m 15s | Max: 56m 45s
      🟩 Clang11            Pass: 100%/4   | Total:  3h 40m | Avg: 55m 03s | Max: 56m 29s
      🟩 Clang12            Pass: 100%/4   | Total:  3h 30m | Avg: 52m 33s | Max: 56m 18s
      🟩 Clang13            Pass: 100%/4   | Total:  3h 32m | Avg: 53m 07s | Max: 55m 39s
      🟩 Clang14            Pass: 100%/4   | Total:  3h 30m | Avg: 52m 41s | Max: 55m 22s
      🟩 Clang15            Pass: 100%/4   | Total:  3h 33m | Avg: 53m 22s | Max: 55m 27s
      🟩 Clang16            Pass: 100%/4   | Total:  3h 46m | Avg: 56m 38s | Max: 57m 48s
      🟨 Clang17            Pass:  38%/26  | Total: 11h 57m | Avg: 27m 35s | Max: 54m 18s
      🟩 GCC6               Pass: 100%/2   | Total:  1h 26m | Avg: 43m 00s | Max: 44m 13s
      🟩 GCC7               Pass: 100%/6   | Total:  4h 43m | Avg: 47m 16s | Max: 50m 49s
      🟩 GCC8               Pass: 100%/6   | Total:  4h 52m | Avg: 48m 45s | Max: 56m 58s
      🟩 GCC9               Pass: 100%/6   | Total:  4h 51m | Avg: 48m 31s | Max: 53m 27s
      🟩 GCC10              Pass: 100%/4   | Total:  3h 26m | Avg: 51m 37s | Max: 55m 40s
      🟩 GCC11              Pass: 100%/7   | Total:  7h 08m | Avg:  1h 01m | Max:  1h 11m
      🟩 GCC12              Pass: 100%/4   | Total:  3h 42m | Avg: 55m 43s | Max: 57m 36s
      🟨 GCC13              Pass:  41%/29  | Total: 13h 01m | Avg: 26m 56s | Max: 55m 18s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 52m | Avg: 57m 21s | Max: 58m 40s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 58m 35s | Avg: 58m 35s | Max: 58m 35s | Hits:  33%/718   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 12m | Hits:  33%/1436  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  3h 22m | Avg:  1h 07m | Max:  1h 09m | Hits:  33%/2154  
    🟨 cxx_family
      🟨 Clang              Pass:  72%/59  | Total:  1d 16h | Avg: 41m 39s | Max: 57m 48s
      🟨 GCC                Pass:  73%/64  | Total:  1d 19h | Avg: 40m 29s | Max:  1h 11m
      🟩 Intel              Pass: 100%/3   | Total:  2h 52m | Avg: 57m 21s | Max: 58m 40s
      🟩 MSVC               Pass: 100%/6   | Total:  6h 38m | Avg:  1h 06m | Max:  1h 12m | Hits:  33%/4308  
    🟨 jobs
      🟩 Build              Pass: 100%/99  | Total:  3d 13h | Avg: 51m 44s | Max:  1h 12m | Hits:  33%/4308  
      🟥 DeviceLaunch       Pass:   0%/8   | Total:  1h 40m | Avg: 12m 32s | Max: 15m 56s
      🟥 GraphCapture       Pass:   0%/8   | Total:  1h 28m | Avg: 11m 05s | Max: 13m 03s
      🟥 HostLaunch         Pass:   0%/8   | Total:  1h 35m | Avg: 11m 55s | Max: 16m 22s
      🟥 SmallGMem          Pass:   0%/1   | Total: 23m 09s | Avg: 23m 09s | Max: 23m 09s
      🟥 TestGPU            Pass:   0%/8   | Total:  3h 09m | Avg: 23m 39s | Max: 29m 27s
    🟨 gpu
      🟨 v100               Pass:  75%/132 | Total:  3d 21h | Avg: 42m 34s | Max:  1h 12m | Hits:  33%/4308  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  3h 28m | Avg:  1h 09m | Max:  1h 11m
      🟩 90a                Pass: 100%/4   | Total:  1h 40m | Avg: 25m 10s | Max: 25m 56s
    🟨 std
      🟨 11                 Pass:  76%/34  | Total:  1d 00h | Avg: 42m 38s | Max:  1h 07m
      🟨 14                 Pass:  78%/37  | Total:  1d 03h | Avg: 43m 54s | Max:  1h 11m | Hits:  33%/2154  
      🟨 17                 Pass:  75%/37  | Total:  1d 02h | Avg: 43m 31s | Max:  1h 12m | Hits:  33%/1436  
      🟨 20                 Pass:  66%/24  | Total: 15h 35m | Avg: 38m 58s | Max:  1h 09m | Hits:  33%/718   
    
  • 🟨 thrust: Pass: 93%/118 | Total: 2d 10h | Avg: 29m 47s | Max: 1h 09m | Hits: 76%/20079

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  92%/110 | Total:  2d 06h | Avg: 29m 54s | Max:  1h 09m | Hits:  76%/20079 
      🟩 arm64              Pass: 100%/8   | Total:  3h 44m | Avg: 28m 07s | Max: 31m 45s
    🔍 ctk: 12.5 🔍
      🟩 11.1               Pass: 100%/15  | Total:  7h 19m | Avg: 29m 19s | Max: 54m 03s | Hits:  65%/2231  
      🟩 11.8               Pass: 100%/3   | Total:  2h 04m | Avg: 41m 37s | Max: 48m 02s
      🔍 12.5               Pass:  92%/100 | Total:  2d 01h | Avg: 29m 30s | Max:  1h 09m | Hits:  78%/17848 
    🔍 cudacxx: nvcc12.5 🔍
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 59m 29s | Avg: 29m 44s | Max: 30m 43s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  7h 19m | Avg: 29m 19s | Max: 54m 03s | Hits:  65%/2231  
      🟩 nvcc11.8           Pass: 100%/3   | Total:  2h 04m | Avg: 41m 37s | Max: 48m 02s
      🔍 nvcc12.5           Pass:  91%/98  | Total:  2d 00h | Avg: 29m 30s | Max:  1h 09m | Hits:  78%/17848 
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 59m 29s | Avg: 29m 44s | Max: 30m 43s
      🔍 nvcc               Pass:  93%/116 | Total:  2d 09h | Avg: 29m 47s | Max:  1h 09m | Hits:  76%/20079 
    🚨 jobs: TestGPU 🚨
      🟩 Build              Pass: 100%/99  | Total:  2d 06h | Avg: 32m 45s | Max:  1h 09m | Hits:  65%/13386 
      🟩 TestCPU            Pass: 100%/11  | Total:  2h 11m | Avg: 11m 55s | Max: 24m 55s | Hits:  99%/6693  
      🔥 TestGPU            Pass:   0%/8   | Total:  2h 21m | Avg: 17m 39s | Max: 19m 09s
    🟨 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  2h 54m | Avg: 29m 08s | Max: 36m 25s
      🟩 Clang10            Pass: 100%/3   | Total:  1h 38m | Avg: 32m 52s | Max: 38m 09s
      🟩 Clang11            Pass: 100%/4   | Total:  2h 03m | Avg: 30m 57s | Max: 34m 34s
      🟩 Clang12            Pass: 100%/4   | Total:  2h 02m | Avg: 30m 41s | Max: 34m 13s
      🟩 Clang13            Pass: 100%/4   | Total:  1h 59m | Avg: 29m 49s | Max: 33m 07s
      🟩 Clang14            Pass: 100%/4   | Total:  2h 05m | Avg: 31m 16s | Max: 34m 37s
      🟩 Clang15            Pass: 100%/4   | Total:  1h 58m | Avg: 29m 39s | Max: 33m 27s
      🟩 Clang16            Pass: 100%/4   | Total:  2h 08m | Avg: 32m 08s | Max: 33m 48s
      🟨 Clang17            Pass:  77%/18  | Total:  6h 31m | Avg: 21m 46s | Max: 36m 16s
      🟩 GCC6               Pass: 100%/2   | Total: 54m 25s | Avg: 27m 12s | Max: 30m 39s
      🟩 GCC7               Pass: 100%/6   | Total:  2h 49m | Avg: 28m 19s | Max: 33m 17s
      🟩 GCC8               Pass: 100%/6   | Total:  2h 58m | Avg: 29m 48s | Max: 35m 07s
      🟩 GCC9               Pass: 100%/6   | Total:  3h 05m | Avg: 30m 53s | Max: 35m 05s
      🟩 GCC10              Pass: 100%/4   | Total:  2h 15m | Avg: 33m 49s | Max: 36m 22s
      🟩 GCC11              Pass: 100%/7   | Total:  4h 17m | Avg: 36m 45s | Max: 48m 02s
      🟩 GCC12              Pass: 100%/4   | Total:  2h 14m | Avg: 33m 41s | Max: 38m 34s
      🟨 GCC13              Pass:  80%/20  | Total:  7h 15m | Avg: 21m 47s | Max: 40m 07s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 52m | Avg: 37m 33s | Max: 45m 01s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 54m 03s | Avg: 54m 03s | Max: 54m 03s | Hits:  65%/2231  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 04m | Hits:  65%/4462  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  4h 28m | Avg: 44m 40s | Max:  1h 09m | Hits:  82%/13386 
    🟨 cxx_family
      🟨 Clang              Pass:  92%/51  | Total: 23h 23m | Avg: 27m 31s | Max: 38m 09s
      🟨 GCC                Pass:  92%/55  | Total:  1d 01h | Avg: 28m 12s | Max: 48m 02s
      🟩 Intel              Pass: 100%/3   | Total:  1h 52m | Avg: 37m 33s | Max: 45m 01s
      🟩 MSVC               Pass: 100%/9   | Total:  7h 27m | Avg: 49m 44s | Max:  1h 09m | Hits:  76%/20079 
    🟨 gpu
      🟨 v100               Pass:  93%/118 | Total:  2d 10h | Avg: 29m 47s | Max:  1h 09m | Hits:  76%/20079 
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  2h 04m | Avg: 41m 37s | Max: 48m 02s
      🟩 90a                Pass: 100%/4   | Total:  1h 16m | Avg: 19m 02s | Max: 21m 13s
    🟨 std
      🟨 11                 Pass:  93%/30  | Total: 11h 52m | Avg: 23m 44s | Max: 34m 00s
      🟨 14                 Pass:  94%/34  | Total: 17h 58m | Avg: 31m 43s | Max:  1h 02m | Hits:  74%/8924  
      🟨 17                 Pass:  93%/33  | Total: 18h 07m | Avg: 32m 56s | Max:  1h 04m | Hits:  76%/6693  
      🟨 20                 Pass:  90%/21  | Total: 10h 37m | Avg: 30m 20s | Max:  1h 09m | Hits:  82%/4462  
    
  • 🟥 pycuda: Pass: 0%/1 | Total: 13m 07s | Avg: 13m 07s | Max: 13m 07s

    🟥 cpu
      🟥 amd64              Pass:   0%/1   | Total: 13m 07s | Avg: 13m 07s | Max: 13m 07s
    🟥 ctk
      🟥 12.5               Pass:   0%/1   | Total: 13m 07s | Avg: 13m 07s | Max: 13m 07s
    🟥 cudacxx
      🟥 nvcc12.5           Pass:   0%/1   | Total: 13m 07s | Avg: 13m 07s | Max: 13m 07s
    🟥 cudacxx_family
      🟥 nvcc               Pass:   0%/1   | Total: 13m 07s | Avg: 13m 07s | Max: 13m 07s
    🟥 cxx
      🟥 GCC13              Pass:   0%/1   | Total: 13m 07s | Avg: 13m 07s | Max: 13m 07s
    🟥 cxx_family
      🟥 GCC                Pass:   0%/1   | Total: 13m 07s | Avg: 13m 07s | Max: 13m 07s
    🟥 gpu
      🟥 v100               Pass:   0%/1   | Total: 13m 07s | Avg: 13m 07s | Max: 13m 07s
    🟥 jobs
      🟥 Test               Pass:   0%/1   | Total: 13m 07s | Avg: 13m 07s | Max: 13m 07s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
pycuda
CUDA C Core Library

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda
+/- CUDA C Core Library

🏃‍ Runner counts (total jobs: 251)

# Runner
178 linux-amd64-cpu16
42 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

@fbusato
Copy link
Contributor Author

fbusato commented Sep 5, 2024

/ok to test

@fbusato
Copy link
Contributor Author

fbusato commented Sep 5, 2024

/ok to test

Copy link
Contributor

github-actions bot commented Sep 6, 2024

🟩 CI finished in 6h 28m: Pass: 100%/251 | Total: 6d 16h | Avg: 38m 22s | Max: 1h 36m | Hits: 68%/24387
  • 🟩 cub: Pass: 100%/132 | Total: 4d 06h | Avg: 46m 34s | Max: 1h 36m | Hits: 28%/4308

    🟩 cpu
      🟩 amd64              Pass: 100%/124 | Total:  3d 23h | Avg: 46m 05s | Max:  1h 36m | Hits:  28%/4308  
      🟩 arm64              Pass: 100%/8   | Total:  7h 12m | Avg: 54m 06s | Max: 55m 57s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total: 11h 42m | Avg: 46m 48s | Max: 53m 04s | Hits:  28%/718   
      🟩 11.8               Pass: 100%/3   | Total:  3h 36m | Avg:  1h 12m | Max:  1h 16m
      🟩 12.5               Pass: 100%/114 | Total:  3d 15h | Avg: 45m 52s | Max:  1h 36m | Hits:  28%/3590  
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 48m 09s | Avg: 24m 04s | Max: 24m 45s
      🟩 nvcc11.1           Pass: 100%/15  | Total: 11h 42m | Avg: 46m 48s | Max: 53m 04s | Hits:  28%/718   
      🟩 nvcc11.8           Pass: 100%/3   | Total:  3h 36m | Avg:  1h 12m | Max:  1h 16m
      🟩 nvcc12.5           Pass: 100%/112 | Total:  3d 14h | Avg: 46m 15s | Max:  1h 36m | Hits:  28%/3590  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 48m 09s | Avg: 24m 04s | Max: 24m 45s
      🟩 nvcc               Pass: 100%/130 | Total:  4d 05h | Avg: 46m 55s | Max:  1h 36m | Hits:  28%/4308  
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  5h 01m | Avg: 50m 10s | Max: 56m 53s
      🟩 Clang10            Pass: 100%/3   | Total:  2h 33m | Avg: 51m 05s | Max: 51m 37s
      🟩 Clang11            Pass: 100%/4   | Total:  3h 38m | Avg: 54m 37s | Max: 56m 33s
      🟩 Clang12            Pass: 100%/4   | Total:  3h 42m | Avg: 55m 40s | Max: 57m 04s
      🟩 Clang13            Pass: 100%/4   | Total:  3h 35m | Avg: 53m 54s | Max: 57m 06s
      🟩 Clang14            Pass: 100%/4   | Total:  3h 29m | Avg: 52m 18s | Max: 54m 56s
      🟩 Clang15            Pass: 100%/4   | Total:  3h 38m | Avg: 54m 41s | Max: 58m 14s
      🟩 Clang16            Pass: 100%/4   | Total:  3h 38m | Avg: 54m 34s | Max: 56m 30s
      🟩 Clang17            Pass: 100%/26  | Total: 13h 59m | Avg: 32m 17s | Max: 56m 03s
      🟩 GCC6               Pass: 100%/2   | Total:  1h 31m | Avg: 45m 31s | Max: 48m 20s
      🟩 GCC7               Pass: 100%/6   | Total:  4h 49m | Avg: 48m 11s | Max: 52m 56s
      🟩 GCC8               Pass: 100%/6   | Total:  5h 05m | Avg: 50m 53s | Max: 56m 29s
      🟩 GCC9               Pass: 100%/6   | Total:  4h 51m | Avg: 48m 37s | Max: 51m 59s
      🟩 GCC10              Pass: 100%/4   | Total:  3h 46m | Avg: 56m 35s | Max: 57m 57s
      🟩 GCC11              Pass: 100%/7   | Total:  7h 16m | Avg:  1h 02m | Max:  1h 16m
      🟩 GCC12              Pass: 100%/4   | Total:  3h 32m | Avg: 53m 01s | Max: 55m 43s
      🟩 GCC13              Pass: 100%/29  | Total: 18h 53m | Avg: 39m 04s | Max:  1h 36m
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 51m | Avg: 57m 12s | Max:  1h 00m
      🟩 MSVC14.16          Pass: 100%/1   | Total: 53m 04s | Avg: 53m 04s | Max: 53m 04s | Hits:  28%/718   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 12m | Avg:  1h 06m | Max:  1h 06m | Hits:  28%/1436  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  3h 28m | Avg:  1h 09m | Max:  1h 10m | Hits:  28%/2154  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/59  | Total:  1d 19h | Avg: 44m 00s | Max: 58m 14s
      🟩 GCC                Pass: 100%/64  | Total:  2d 01h | Avg: 46m 38s | Max:  1h 36m
      🟩 Intel              Pass: 100%/3   | Total:  2h 51m | Avg: 57m 12s | Max:  1h 00m
      🟩 MSVC               Pass: 100%/6   | Total:  6h 34m | Avg:  1h 05m | Max:  1h 10m | Hits:  28%/4308  
    🟩 gpu
      🟩 v100               Pass: 100%/132 | Total:  4d 06h | Avg: 46m 34s | Max:  1h 36m | Hits:  28%/4308  
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  3d 14h | Avg: 52m 18s | Max:  1h 16m | Hits:  28%/4308  
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  3h 48m | Avg: 28m 34s | Max:  1h 30m
      🟩 GraphCapture       Pass: 100%/8   | Total:  3h 20m | Avg: 25m 02s | Max:  1h 26m
      🟩 HostLaunch         Pass: 100%/8   | Total:  3h 40m | Avg: 27m 32s | Max:  1h 13m
      🟩 SmallGMem          Pass: 100%/1   | Total: 31m 41s | Avg: 31m 41s | Max: 31m 41s
      🟩 TestGPU            Pass: 100%/8   | Total:  4h 48m | Avg: 36m 02s | Max:  1h 36m
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  3h 36m | Avg:  1h 12m | Max:  1h 16m
      🟩 90a                Pass: 100%/4   | Total:  1h 34m | Avg: 23m 37s | Max: 25m 48s
    🟩 std
      🟩 11                 Pass: 100%/34  | Total:  1d 05h | Avg: 51m 59s | Max:  1h 36m
      🟩 14                 Pass: 100%/37  | Total:  1d 04h | Avg: 46m 04s | Max:  1h 09m | Hits:  28%/2154  
      🟩 17                 Pass: 100%/37  | Total:  1d 03h | Avg: 45m 14s | Max:  1h 14m | Hits:  28%/1436  
      🟩 20                 Pass: 100%/24  | Total: 16h 41m | Avg: 41m 43s | Max:  1h 08m | Hits:  28%/718   
    
  • 🟩 thrust: Pass: 100%/118 | Total: 2d 09h | Avg: 29m 24s | Max: 1h 06m | Hits: 76%/20079

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total:  2d 06h | Avg: 29m 27s | Max:  1h 06m | Hits:  76%/20079 
      🟩 arm64              Pass: 100%/8   | Total:  3h 49m | Avg: 28m 39s | Max: 32m 16s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  7h 13m | Avg: 28m 53s | Max:  1h 00m | Hits:  65%/2231  
      🟩 11.8               Pass: 100%/3   | Total:  2h 04m | Avg: 41m 31s | Max: 48m 38s
      🟩 12.5               Pass: 100%/100 | Total:  2d 00h | Avg: 29m 07s | Max:  1h 06m | Hits:  78%/17848 
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 57m 52s | Avg: 28m 56s | Max: 29m 51s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  7h 13m | Avg: 28m 53s | Max:  1h 00m | Hits:  65%/2231  
      🟩 nvcc11.8           Pass: 100%/3   | Total:  2h 04m | Avg: 41m 31s | Max: 48m 38s
      🟩 nvcc12.5           Pass: 100%/98  | Total:  1d 23h | Avg: 29m 07s | Max:  1h 06m | Hits:  78%/17848 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 57m 52s | Avg: 28m 56s | Max: 29m 51s
      🟩 nvcc               Pass: 100%/116 | Total:  2d 08h | Avg: 29m 24s | Max:  1h 06m | Hits:  76%/20079 
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  2h 46m | Avg: 27m 49s | Max: 33m 53s
      🟩 Clang10            Pass: 100%/3   | Total:  1h 36m | Avg: 32m 05s | Max: 33m 30s
      🟩 Clang11            Pass: 100%/4   | Total:  1h 55m | Avg: 28m 58s | Max: 31m 05s
      🟩 Clang12            Pass: 100%/4   | Total:  1h 59m | Avg: 29m 46s | Max: 34m 01s
      🟩 Clang13            Pass: 100%/4   | Total:  2h 04m | Avg: 31m 14s | Max: 35m 17s
      🟩 Clang14            Pass: 100%/4   | Total:  2h 10m | Avg: 32m 31s | Max: 36m 24s
      🟩 Clang15            Pass: 100%/4   | Total:  2h 03m | Avg: 30m 58s | Max: 34m 18s
      🟩 Clang16            Pass: 100%/4   | Total:  2h 04m | Avg: 31m 13s | Max: 34m 51s
      🟩 Clang17            Pass: 100%/18  | Total:  6h 09m | Avg: 20m 30s | Max: 32m 54s
      🟩 GCC6               Pass: 100%/2   | Total: 49m 20s | Avg: 24m 40s | Max: 27m 50s
      🟩 GCC7               Pass: 100%/6   | Total:  2h 58m | Avg: 29m 47s | Max: 34m 33s
      🟩 GCC8               Pass: 100%/6   | Total:  2h 59m | Avg: 29m 51s | Max: 36m 55s
      🟩 GCC9               Pass: 100%/6   | Total:  3h 06m | Avg: 31m 03s | Max: 37m 55s
      🟩 GCC10              Pass: 100%/4   | Total:  2h 13m | Avg: 33m 23s | Max: 35m 49s
      🟩 GCC11              Pass: 100%/7   | Total:  4h 15m | Avg: 36m 32s | Max: 48m 38s
      🟩 GCC12              Pass: 100%/4   | Total:  2h 21m | Avg: 35m 22s | Max: 40m 03s
      🟩 GCC13              Pass: 100%/20  | Total:  6h 57m | Avg: 20m 51s | Max: 37m 57s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 57m | Avg: 39m 15s | Max: 43m 16s
      🟩 MSVC14.16          Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m | Hits:  65%/2231  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 04m | Hits:  65%/4462  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  4h 17m | Avg: 42m 53s | Max:  1h 06m | Hits:  82%/13386 
    🟩 cxx_family
      🟩 Clang              Pass: 100%/51  | Total: 22h 51m | Avg: 26m 52s | Max: 36m 24s
      🟩 GCC                Pass: 100%/55  | Total:  1d 01h | Avg: 28m 01s | Max: 48m 38s
      🟩 Intel              Pass: 100%/3   | Total:  1h 57m | Avg: 39m 15s | Max: 43m 16s
      🟩 MSVC               Pass: 100%/9   | Total:  7h 19m | Avg: 48m 51s | Max:  1h 06m | Hits:  76%/20079 
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total:  2d 09h | Avg: 29m 24s | Max:  1h 06m | Hits:  76%/20079 
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  2d 05h | Avg: 32m 40s | Max:  1h 06m | Hits:  65%/13386 
      🟩 TestCPU            Pass: 100%/11  | Total:  2h 10m | Avg: 11m 49s | Max: 26m 06s | Hits:  99%/6693  
      🟩 TestGPU            Pass: 100%/8   | Total:  1h 44m | Avg: 13m 06s | Max: 15m 33s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  2h 04m | Avg: 41m 31s | Max: 48m 38s
      🟩 90a                Pass: 100%/4   | Total:  1h 15m | Avg: 18m 45s | Max: 20m 14s
    🟩 std
      🟩 11                 Pass: 100%/30  | Total: 11h 44m | Avg: 23m 28s | Max: 34m 10s
      🟩 14                 Pass: 100%/34  | Total: 17h 37m | Avg: 31m 05s | Max:  1h 00m | Hits:  74%/8924  
      🟩 17                 Pass: 100%/33  | Total: 18h 02m | Avg: 32m 48s | Max:  1h 06m | Hits:  76%/6693  
      🟩 20                 Pass: 100%/21  | Total: 10h 25m | Avg: 29m 47s | Max:  1h 02m | Hits:  82%/4462  
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 14m 55s | Avg: 14m 55s | Max: 14m 55s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 14m 55s | Avg: 14m 55s | Max: 14m 55s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 14m 55s | Avg: 14m 55s | Max: 14m 55s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 14m 55s | Avg: 14m 55s | Max: 14m 55s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 14m 55s | Avg: 14m 55s | Max: 14m 55s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 14m 55s | Avg: 14m 55s | Max: 14m 55s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 14m 55s | Avg: 14m 55s | Max: 14m 55s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 14m 55s | Avg: 14m 55s | Max: 14m 55s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 14m 55s | Avg: 14m 55s | Max: 14m 55s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
pycuda
CUDA C Core Library

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda
+/- CUDA C Core Library

🏃‍ Runner counts (total jobs: 251)

# Runner
178 linux-amd64-cpu16
42 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

@fbusato fbusato merged commit 3adc92a into NVIDIA:main Sep 6, 2024
260 of 264 checks passed
@fbusato fbusato deleted the cub/dpx-reduction branch September 6, 2024 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants