Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup CUB block/thread load and exchange #1946

Merged

Conversation

bernhardmgruber
Copy link
Contributor

@bernhardmgruber bernhardmgruber commented Jul 5, 2024

Since I had to read into CUB's block/thread load and block exchange, here are a few improvements.

Let's also check for any SASS differences, since this code is at the heart of all CUB algorithms:

  • Check if SASS of cub.test.block_load.it_11 changed. - Identical before and after this PR.

@bernhardmgruber bernhardmgruber added the cub For all items related to CUB label Jul 5, 2024
template <int DUMMY>
struct LoadInternal<BLOCK_LOAD_DIRECT, DUMMY>
{
/// Shared memory storage layout type
using TempStorage = NullType;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting from here, I deleted all the documentation blocks of the LoadInternal specializations, because they are private inside of BlockLoad and named Internal. Furthermore, their behavior is amply documented in the BlockLoadAlgorithm enumeration. Also, the comments were highly redundant and were at least one time wrong.

Copy link
Contributor

github-actions bot commented Jul 6, 2024

🟩 CI finished in 2h 56m: Pass: 100%/249 | Total: 4d 20h | Avg: 28m 03s | Max: 52m 27s | Hits: 61%/248564
  • 🟩 cub: Pass: 100%/131 | Total: 2d 19h | Avg: 30m 51s | Max: 50m 52s | Hits: 52%/109298

    🟩 cpu
      🟩 amd64              Pass: 100%/123 | Total:  2d 14h | Avg: 30m 38s | Max: 50m 52s | Hits:  53%/102474
      🟩 arm64              Pass: 100%/8   | Total:  4h 34m | Avg: 34m 17s | Max: 36m 44s | Hits:  37%/6824  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  7h 56m | Avg: 31m 45s | Max: 48m 05s | Hits:  35%/11583 
      🟩 11.8               Pass: 100%/3   | Total:  2h 23m | Avg: 47m 47s | Max: 50m 52s | Hits:  37%/2559  
      🟩 12.5               Pass: 100%/113 | Total:  2d 09h | Avg: 30m 17s | Max: 46m 21s | Hits:  55%/95156 
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 43m 33s | Avg: 21m 46s | Max: 22m 12s | Hits:  39%/1410  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  7h 56m | Avg: 31m 45s | Max: 48m 05s | Hits:  35%/11583 
      🟩 nvcc11.8           Pass: 100%/3   | Total:  2h 23m | Avg: 47m 47s | Max: 50m 52s | Hits:  37%/2559  
      🟩 nvcc12.5           Pass: 100%/111 | Total:  2d 08h | Avg: 30m 26s | Max: 46m 21s | Hits:  55%/93746 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 43m 33s | Avg: 21m 46s | Max: 22m 12s | Hits:  39%/1410  
      🟩 nvcc               Pass: 100%/129 | Total:  2d 18h | Avg: 31m 00s | Max: 50m 52s | Hits:  53%/107888
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  3h 13m | Avg: 32m 13s | Max: 37m 14s | Hits:  37%/4896  
      🟩 Clang10            Pass: 100%/3   | Total:  1h 45m | Avg: 35m 06s | Max: 35m 22s | Hits:  38%/2565  
      🟩 Clang11            Pass: 100%/4   | Total:  2h 17m | Avg: 34m 27s | Max: 35m 46s | Hits:  38%/3420  
      🟩 Clang12            Pass: 100%/4   | Total:  2h 17m | Avg: 34m 29s | Max: 38m 21s | Hits:  38%/3420  
      🟩 Clang13            Pass: 100%/4   | Total:  2h 17m | Avg: 34m 29s | Max: 37m 18s | Hits:  38%/3420  
      🟩 Clang14            Pass: 100%/4   | Total:  2h 18m | Avg: 34m 43s | Max: 36m 56s | Hits:  38%/3420  
      🟩 Clang15            Pass: 100%/4   | Total:  2h 20m | Avg: 35m 03s | Max: 38m 13s | Hits:  38%/3412  
      🟩 Clang16            Pass: 100%/4   | Total:  2h 21m | Avg: 35m 26s | Max: 37m 46s | Hits:  38%/3412  
      🟩 Clang17            Pass: 100%/26  | Total: 10h 44m | Avg: 24m 46s | Max: 39m 03s | Hits:  74%/21882 
      🟩 GCC6               Pass: 100%/2   | Total: 58m 08s | Avg: 29m 04s | Max: 29m 13s | Hits:  35%/1554  
      🟩 GCC7               Pass: 100%/6   | Total:  3h 16m | Avg: 32m 43s | Max: 38m 04s | Hits:  36%/4899  
      🟩 GCC8               Pass: 100%/6   | Total:  3h 20m | Avg: 33m 23s | Max: 35m 16s | Hits:  36%/4899  
      🟩 GCC9               Pass: 100%/6   | Total:  3h 20m | Avg: 33m 23s | Max: 35m 28s | Hits:  36%/4899  
      🟩 GCC10              Pass: 100%/4   | Total:  2h 23m | Avg: 35m 45s | Max: 37m 28s | Hits:  37%/3420  
      🟩 GCC11              Pass: 100%/7   | Total:  4h 39m | Avg: 39m 57s | Max: 50m 52s | Hits:  37%/5971  
      🟩 GCC12              Pass: 100%/4   | Total:  2h 25m | Avg: 36m 15s | Max: 41m 17s | Hits:  37%/3412  
      🟩 GCC13              Pass: 100%/28  | Total: 10h 49m | Avg: 23m 12s | Max: 36m 44s | Hits:  73%/23884 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 04m | Avg: 41m 21s | Max: 43m 26s | Hits:  35%/2337  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 48m 05s | Avg: 48m 05s | Max: 48m 05s | Hits:  37%/696   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 25m | Avg: 42m 47s | Max: 44m 35s | Hits:  37%/1392  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 14m | Avg: 44m 51s | Max: 46m 21s | Hits:  37%/2088  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/59  | Total:  1d 05h | Avg: 30m 07s | Max: 39m 03s | Hits:  54%/49847 
      🟩 GCC                Pass: 100%/63  | Total:  1d 07h | Avg: 29m 43s | Max: 50m 52s | Hits:  53%/52938 
      🟩 Intel              Pass: 100%/3   | Total:  2h 04m | Avg: 41m 21s | Max: 43m 26s | Hits:  35%/2337  
      🟩 MSVC               Pass: 100%/6   | Total:  4h 28m | Avg: 44m 42s | Max: 48m 05s | Hits:  37%/4176  
    🟩 gpu
      🟩 v100               Pass: 100%/131 | Total:  2d 19h | Avg: 30m 51s | Max: 50m 52s | Hits:  52%/109298
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  2d 08h | Avg: 34m 20s | Max: 50m 52s | Hits:  37%/82002 
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 28m | Avg: 18m 34s | Max: 26m 08s | Hits:  99%/6824  
      🟩 GraphCapture       Pass: 100%/8   | Total:  2h 09m | Avg: 16m 12s | Max: 21m 58s | Hits:  99%/6824  
      🟩 HostLaunch         Pass: 100%/8   | Total:  2h 26m | Avg: 18m 21s | Max: 22m 01s | Hits:  99%/6824  
      🟩 TestGPU            Pass: 100%/8   | Total:  3h 38m | Avg: 27m 19s | Max: 33m 39s | Hits:  99%/6824  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  2h 23m | Avg: 47m 47s | Max: 50m 52s | Hits:  37%/2559  
      🟩 90a                Pass: 100%/4   | Total:  1h 14m | Avg: 18m 31s | Max: 19m 47s | Hits:  37%/3412  
    🟩 std
      🟩 11                 Pass: 100%/34  | Total: 16h 33m | Avg: 29m 14s | Max: 50m 52s | Hits:  53%/28571 
      🟩 14                 Pass: 100%/37  | Total: 19h 48m | Avg: 32m 06s | Max: 48m 05s | Hits:  50%/30659 
      🟩 17                 Pass: 100%/36  | Total: 18h 58m | Avg: 31m 37s | Max: 45m 28s | Hits:  51%/29891 
      🟩 20                 Pass: 100%/24  | Total: 12h 02m | Avg: 30m 05s | Max: 45m 07s | Hits:  57%/20177 
    
  • 🟩 thrust: Pass: 100%/118 | Total: 2d 01h | Avg: 24m 55s | Max: 52m 27s | Hits: 68%/139266

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total:  1d 21h | Avg: 24m 52s | Max: 52m 27s | Hits:  69%/129822
      🟩 arm64              Pass: 100%/8   | Total:  3h 26m | Avg: 25m 45s | Max: 28m 31s | Hits:  63%/9444  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  6h 25m | Avg: 25m 40s | Max: 51m 51s | Hits:  62%/17705 
      🟩 11.8               Pass: 100%/3   | Total:  1h 42m | Avg: 34m 07s | Max: 35m 41s | Hits:  63%/3543  
      🟩 12.5               Pass: 100%/100 | Total:  1d 16h | Avg: 24m 32s | Max: 52m 27s | Hits:  69%/118018
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 47m 15s | Avg: 23m 37s | Max: 23m 56s | Hits:  62%/2360  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  6h 25m | Avg: 25m 40s | Max: 51m 51s | Hits:  62%/17705 
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 42m | Avg: 34m 07s | Max: 35m 41s | Hits:  63%/3543  
      🟩 nvcc12.5           Pass: 100%/98  | Total:  1d 16h | Avg: 24m 33s | Max: 52m 27s | Hits:  70%/115658
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 47m 15s | Avg: 23m 37s | Max: 23m 56s | Hits:  62%/2360  
      🟩 nvcc               Pass: 100%/116 | Total:  2d 00h | Avg: 24m 57s | Max: 52m 27s | Hits:  68%/136906
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  2h 27m | Avg: 24m 35s | Max: 27m 26s | Hits:  63%/7080  
      🟩 Clang10            Pass: 100%/3   | Total:  1h 22m | Avg: 27m 27s | Max: 29m 37s | Hits:  63%/3540  
      🟩 Clang11            Pass: 100%/4   | Total:  1h 45m | Avg: 26m 25s | Max: 31m 14s | Hits:  63%/4720  
      🟩 Clang12            Pass: 100%/4   | Total:  1h 42m | Avg: 25m 41s | Max: 27m 07s | Hits:  63%/4720  
      🟩 Clang13            Pass: 100%/4   | Total:  1h 43m | Avg: 25m 46s | Max: 27m 31s | Hits:  63%/4720  
      🟩 Clang14            Pass: 100%/4   | Total:  1h 43m | Avg: 25m 59s | Max: 27m 36s | Hits:  63%/4720  
      🟩 Clang15            Pass: 100%/4   | Total:  1h 43m | Avg: 25m 52s | Max: 27m 17s | Hits:  63%/4720  
      🟩 Clang16            Pass: 100%/4   | Total:  1h 44m | Avg: 26m 12s | Max: 29m 40s | Hits:  63%/4720  
      🟩 Clang17            Pass: 100%/18  | Total:  5h 24m | Avg: 18m 01s | Max: 27m 13s | Hits:  79%/21240 
      🟩 GCC6               Pass: 100%/2   | Total: 45m 17s | Avg: 22m 38s | Max: 24m 55s | Hits:  63%/2360  
      🟩 GCC7               Pass: 100%/6   | Total:  2h 25m | Avg: 24m 18s | Max: 28m 08s | Hits:  63%/7086  
      🟩 GCC8               Pass: 100%/6   | Total:  2h 37m | Avg: 26m 11s | Max: 29m 37s | Hits:  59%/7086  
      🟩 GCC9               Pass: 100%/6   | Total:  2h 36m | Avg: 26m 06s | Max: 33m 34s | Hits:  63%/7086  
      🟩 GCC10              Pass: 100%/4   | Total:  1h 51m | Avg: 27m 51s | Max: 31m 07s | Hits:  63%/4724  
      🟩 GCC11              Pass: 100%/7   | Total:  3h 40m | Avg: 31m 33s | Max: 35m 41s | Hits:  63%/8267  
      🟩 GCC12              Pass: 100%/4   | Total:  1h 50m | Avg: 27m 41s | Max: 32m 03s | Hits:  63%/4724  
      🟩 GCC13              Pass: 100%/20  | Total:  6h 05m | Avg: 18m 16s | Max: 29m 39s | Hits:  76%/23620 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 40m | Avg: 33m 38s | Max: 37m 58s | Hits:  63%/3549  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 51m 51s | Avg: 51m 51s | Max: 51m 51s | Hits:  61%/1176  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 35m | Avg: 47m 49s | Max: 47m 52s | Hits:  61%/2352  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  3h 22m | Avg: 33m 43s | Max: 52m 27s | Hits:  80%/7056  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/51  | Total: 19h 38m | Avg: 23m 05s | Max: 31m 14s | Hits:  69%/60180 
      🟩 GCC                Pass: 100%/55  | Total: 21h 53m | Avg: 23m 52s | Max: 35m 41s | Hits:  67%/64953 
      🟩 Intel              Pass: 100%/3   | Total:  1h 40m | Avg: 33m 38s | Max: 37m 58s | Hits:  63%/3549  
      🟩 MSVC               Pass: 100%/9   | Total:  5h 49m | Avg: 38m 51s | Max: 52m 27s | Hits:  74%/10584 
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total:  2d 01h | Avg: 24m 55s | Max: 52m 27s | Hits:  68%/139266
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  1d 21h | Avg: 27m 32s | Max: 52m 27s | Hits:  63%/116850
      🟩 TestCPU            Pass: 100%/11  | Total:  1h 41m | Avg:  9m 15s | Max: 18m 20s | Hits:  99%/12972 
      🟩 TestGPU            Pass: 100%/8   | Total:  1h 52m | Avg: 14m 06s | Max: 18m 02s | Hits:  97%/9444  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 42m | Avg: 34m 07s | Max: 35m 41s | Hits:  63%/3543  
      🟩 90a                Pass: 100%/4   | Total:  1h 01m | Avg: 15m 19s | Max: 16m 46s | Hits:  63%/4724  
    🟩 std
      🟩 11                 Pass: 100%/30  | Total: 10h 38m | Avg: 21m 17s | Max: 31m 03s | Hits:  69%/35418 
      🟩 14                 Pass: 100%/34  | Total: 15h 07m | Avg: 26m 40s | Max: 51m 51s | Hits:  67%/40122 
      🟩 17                 Pass: 100%/33  | Total: 14h 36m | Avg: 26m 34s | Max: 49m 35s | Hits:  68%/38946 
      🟩 20                 Pass: 100%/21  | Total:  8h 39m | Avg: 24m 43s | Max: 52m 27s | Hits:  71%/24780 
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental

🏃‍ Runner counts (total jobs: 249)

# Runner
178 linux-amd64-cpu16
40 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

@bernhardmgruber bernhardmgruber marked this pull request as ready for review July 8, 2024 07:13
@bernhardmgruber bernhardmgruber requested review from a team as code owners July 8, 2024 07:13
@bernhardmgruber bernhardmgruber marked this pull request as draft July 9, 2024 15:14
@bernhardmgruber bernhardmgruber marked this pull request as ready for review July 19, 2024 12:41
Copy link
Contributor

🟩 CI finished in 3h 02m: Pass: 100%/250 | Total: 4d 19h | Avg: 27m 39s | Max: 50m 37s | Hits: 57%/248341
  • 🟩 cub: Pass: 100%/131 | Total: 2d 18h | Avg: 30m 33s | Max: 49m 16s | Hits: 41%/109429

    🟩 cpu
      🟩 amd64              Pass: 100%/123 | Total:  2d 13h | Avg: 30m 13s | Max: 49m 16s | Hits:  42%/102597
      🟩 arm64              Pass: 100%/8   | Total:  4h 45m | Avg: 35m 41s | Max: 41m 36s | Hits:  26%/6832  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  7h 33m | Avg: 30m 15s | Max: 49m 16s | Hits:  20%/11598 
      🟩 11.8               Pass: 100%/3   | Total:  2h 14m | Avg: 44m 48s | Max: 45m 45s | Hits:  26%/2562  
      🟩 12.5               Pass: 100%/113 | Total:  2d 08h | Avg: 30m 12s | Max: 47m 35s | Hits:  45%/95269 
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 43m 17s | Avg: 21m 38s | Max: 22m 04s | Hits:  25%/1412  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  7h 33m | Avg: 30m 15s | Max: 49m 16s | Hits:  20%/11598 
      🟩 nvcc11.8           Pass: 100%/3   | Total:  2h 14m | Avg: 44m 48s | Max: 45m 45s | Hits:  26%/2562  
      🟩 nvcc12.5           Pass: 100%/111 | Total:  2d 08h | Avg: 30m 22s | Max: 47m 35s | Hits:  45%/93857 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 43m 17s | Avg: 21m 38s | Max: 22m 04s | Hits:  25%/1412  
      🟩 nvcc               Pass: 100%/129 | Total:  2d 17h | Avg: 30m 41s | Max: 49m 16s | Hits:  42%/108017
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  3h 12m | Avg: 32m 04s | Max: 37m 01s | Hits:  14%/4902  
      🟩 Clang10            Pass: 100%/3   | Total:  1h 46m | Avg: 35m 39s | Max: 37m 52s | Hits:  16%/2568  
      🟩 Clang11            Pass: 100%/4   | Total:  2h 16m | Avg: 34m 13s | Max: 35m 46s | Hits:  16%/3424  
      🟩 Clang12            Pass: 100%/4   | Total:  2h 22m | Avg: 35m 33s | Max: 36m 24s | Hits:  16%/3424  
      🟩 Clang13            Pass: 100%/4   | Total:  2h 16m | Avg: 34m 12s | Max: 34m 48s | Hits:  16%/3424  
      🟩 Clang14            Pass: 100%/4   | Total:  2h 13m | Avg: 33m 22s | Max: 34m 25s | Hits:  26%/3424  
      🟩 Clang15            Pass: 100%/4   | Total:  2h 13m | Avg: 33m 27s | Max: 34m 38s | Hits:  26%/3416  
      🟩 Clang16            Pass: 100%/4   | Total:  2h 16m | Avg: 34m 01s | Max: 34m 51s | Hits:  26%/3416  
      🟩 Clang17            Pass: 100%/26  | Total: 10h 34m | Avg: 24m 24s | Max: 44m 25s | Hits:  72%/21908 
      🟩 GCC6               Pass: 100%/2   | Total: 57m 39s | Avg: 28m 49s | Max: 28m 56s | Hits:  23%/1556  
      🟩 GCC7               Pass: 100%/6   | Total:  3h 02m | Avg: 30m 25s | Max: 32m 56s | Hits:  24%/4905  
      🟩 GCC8               Pass: 100%/6   | Total:  3h 07m | Avg: 31m 19s | Max: 34m 37s | Hits:  24%/4905  
      🟩 GCC9               Pass: 100%/6   | Total:  3h 24m | Avg: 34m 02s | Max: 40m 26s | Hits:  24%/4905  
      🟩 GCC10              Pass: 100%/4   | Total:  2h 20m | Avg: 35m 12s | Max: 36m 25s | Hits:  26%/3424  
      🟩 GCC11              Pass: 100%/7   | Total:  4h 32m | Avg: 38m 51s | Max: 45m 45s | Hits:  26%/5978  
      🟩 GCC12              Pass: 100%/4   | Total:  2h 23m | Avg: 35m 53s | Max: 38m 27s | Hits:  26%/3416  
      🟩 GCC13              Pass: 100%/28  | Total: 11h 06m | Avg: 23m 47s | Max: 41m 36s | Hits:  68%/23912 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 58m | Avg: 39m 30s | Max: 41m 01s | Hits:  12%/2340  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 49m 16s | Avg: 49m 16s | Max: 49m 16s | Hits:  11%/697   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 27m | Avg: 43m 34s | Max: 44m 14s | Hits:  11%/1394  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 19m | Avg: 46m 20s | Max: 47m 35s | Hits:  11%/2091  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/59  | Total:  1d 05h | Avg: 29m 43s | Max: 44m 25s | Hits:  43%/49906 
      🟩 GCC                Pass: 100%/63  | Total:  1d 06h | Avg: 29m 26s | Max: 45m 45s | Hits:  44%/53001 
      🟩 Intel              Pass: 100%/3   | Total:  1h 58m | Avg: 39m 30s | Max: 41m 01s | Hits:  12%/2340  
      🟩 MSVC               Pass: 100%/6   | Total:  4h 35m | Avg: 45m 54s | Max: 49m 16s | Hits:  11%/4182  
    🟩 gpu
      🟩 v100               Pass: 100%/131 | Total:  2d 18h | Avg: 30m 33s | Max: 49m 16s | Hits:  41%/109429
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  2d 08h | Avg: 34m 14s | Max: 49m 16s | Hits:  22%/82101 
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 19m | Avg: 17m 23s | Max: 18m 30s | Hits:  99%/6832  
      🟩 GraphCapture       Pass: 100%/8   | Total:  2h 02m | Avg: 15m 19s | Max: 16m 35s | Hits:  99%/6832  
      🟩 HostLaunch         Pass: 100%/8   | Total:  2h 20m | Avg: 17m 31s | Max: 20m 35s | Hits:  99%/6832  
      🟩 TestGPU            Pass: 100%/8   | Total:  3h 30m | Avg: 26m 22s | Max: 44m 25s | Hits:  99%/6832  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  2h 14m | Avg: 44m 48s | Max: 45m 45s | Hits:  26%/2562  
      🟩 90a                Pass: 100%/4   | Total:  1h 15m | Avg: 18m 58s | Max: 21m 02s | Hits:  26%/3416  
    🟩 std
      🟩 11                 Pass: 100%/34  | Total: 17h 15m | Avg: 30m 26s | Max: 44m 25s | Hits:  41%/28605 
      🟩 14                 Pass: 100%/37  | Total: 19h 25m | Avg: 31m 29s | Max: 49m 16s | Hits:  39%/30696 
      🟩 17                 Pass: 100%/36  | Total: 18h 21m | Avg: 30m 36s | Max: 44m 47s | Hits:  40%/29927 
      🟩 20                 Pass: 100%/24  | Total: 11h 40m | Avg: 29m 11s | Max: 47m 35s | Hits:  49%/20201 
    
  • 🟩 thrust: Pass: 100%/118 | Total: 2d 00h | Avg: 24m 35s | Max: 50m 37s | Hits: 69%/138912

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total:  1d 20h | Avg: 24m 29s | Max: 50m 37s | Hits:  69%/129492
      🟩 arm64              Pass: 100%/8   | Total:  3h 27m | Avg: 25m 57s | Max: 28m 57s | Hits:  63%/9420  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  6h 01m | Avg: 24m 05s | Max: 45m 55s | Hits:  63%/17660 
      🟩 11.8               Pass: 100%/3   | Total:  1h 37m | Avg: 32m 37s | Max: 35m 22s | Hits:  63%/3534  
      🟩 12.5               Pass: 100%/100 | Total:  1d 16h | Avg: 24m 25s | Max: 50m 37s | Hits:  70%/117718
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 46m 46s | Avg: 23m 23s | Max: 23m 31s | Hits:  62%/2354  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  6h 01m | Avg: 24m 05s | Max: 45m 55s | Hits:  63%/17660 
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 37m | Avg: 32m 37s | Max: 35m 22s | Hits:  63%/3534  
      🟩 nvcc12.5           Pass: 100%/98  | Total:  1d 15h | Avg: 24m 27s | Max: 50m 37s | Hits:  70%/115364
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 46m 46s | Avg: 23m 23s | Max: 23m 31s | Hits:  62%/2354  
      🟩 nvcc               Pass: 100%/116 | Total:  1d 23h | Avg: 24m 36s | Max: 50m 37s | Hits:  69%/136558
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  2h 26m | Avg: 24m 20s | Max: 30m 33s | Hits:  63%/7062  
      🟩 Clang10            Pass: 100%/3   | Total:  1h 17m | Avg: 25m 41s | Max: 27m 58s | Hits:  63%/3531  
      🟩 Clang11            Pass: 100%/4   | Total:  1h 42m | Avg: 25m 31s | Max: 26m 55s | Hits:  63%/4708  
      🟩 Clang12            Pass: 100%/4   | Total:  1h 47m | Avg: 26m 55s | Max: 31m 17s | Hits:  63%/4708  
      🟩 Clang13            Pass: 100%/4   | Total:  1h 40m | Avg: 25m 11s | Max: 26m 23s | Hits:  63%/4708  
      🟩 Clang14            Pass: 100%/4   | Total:  1h 43m | Avg: 25m 53s | Max: 28m 01s | Hits:  63%/4708  
      🟩 Clang15            Pass: 100%/4   | Total:  1h 40m | Avg: 25m 05s | Max: 27m 15s | Hits:  63%/4708  
      🟩 Clang16            Pass: 100%/4   | Total:  1h 42m | Avg: 25m 40s | Max: 28m 07s | Hits:  63%/4708  
      🟩 Clang17            Pass: 100%/18  | Total:  5h 33m | Avg: 18m 31s | Max: 27m 17s | Hits:  79%/21186 
      🟩 GCC6               Pass: 100%/2   | Total: 43m 30s | Avg: 21m 45s | Max: 24m 33s | Hits:  63%/2354  
      🟩 GCC7               Pass: 100%/6   | Total:  2h 28m | Avg: 24m 40s | Max: 28m 10s | Hits:  63%/7068  
      🟩 GCC8               Pass: 100%/6   | Total:  2h 28m | Avg: 24m 48s | Max: 30m 16s | Hits:  63%/7068  
      🟩 GCC9               Pass: 100%/6   | Total:  2h 27m | Avg: 24m 34s | Max: 29m 34s | Hits:  63%/7068  
      🟩 GCC10              Pass: 100%/4   | Total:  1h 52m | Avg: 28m 08s | Max: 29m 26s | Hits:  63%/4712  
      🟩 GCC11              Pass: 100%/7   | Total:  3h 25m | Avg: 29m 23s | Max: 35m 22s | Hits:  63%/8246  
      🟩 GCC12              Pass: 100%/4   | Total:  1h 49m | Avg: 27m 15s | Max: 30m 07s | Hits:  63%/4712  
      🟩 GCC13              Pass: 100%/20  | Total:  6h 15m | Avg: 18m 46s | Max: 28m 57s | Hits:  77%/23560 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 35m | Avg: 31m 51s | Max: 35m 32s | Hits:  63%/3540  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 45m 55s | Avg: 45m 55s | Max: 45m 55s | Hits:  61%/1173  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 35m | Avg: 47m 56s | Max: 48m 26s | Hits:  61%/2346  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  3h 20m | Avg: 33m 26s | Max: 50m 37s | Hits:  80%/7038  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/51  | Total: 19h 33m | Avg: 23m 00s | Max: 31m 17s | Hits:  69%/60027 
      🟩 GCC                Pass: 100%/55  | Total: 21h 30m | Avg: 23m 27s | Max: 35m 22s | Hits:  68%/64788 
      🟩 Intel              Pass: 100%/3   | Total:  1h 35m | Avg: 31m 51s | Max: 35m 32s | Hits:  63%/3540  
      🟩 MSVC               Pass: 100%/9   | Total:  5h 42m | Avg: 38m 02s | Max: 50m 37s | Hits:  73%/10557 
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total:  2d 00h | Avg: 24m 35s | Max: 50m 37s | Hits:  69%/138912
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  1d 20h | Avg: 26m 55s | Max: 50m 37s | Hits:  63%/116553
      🟩 TestCPU            Pass: 100%/11  | Total:  1h 42m | Avg:  9m 21s | Max: 18m 32s | Hits:  99%/12939 
      🟩 TestGPU            Pass: 100%/8   | Total:  2h 12m | Avg: 16m 37s | Max: 22m 24s | Hits:  99%/9420  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 37m | Avg: 32m 37s | Max: 35m 22s | Hits:  63%/3534  
      🟩 90a                Pass: 100%/4   | Total:  1h 01m | Avg: 15m 23s | Max: 17m 33s | Hits:  63%/4712  
    🟩 std
      🟩 11                 Pass: 100%/30  | Total: 10h 17m | Avg: 20m 34s | Max: 28m 03s | Hits:  69%/35328 
      🟩 14                 Pass: 100%/34  | Total: 14h 53m | Avg: 26m 16s | Max: 48m 26s | Hits:  67%/40020 
      🟩 17                 Pass: 100%/33  | Total: 14h 23m | Avg: 26m 10s | Max: 49m 00s | Hits:  68%/38847 
      🟩 20                 Pass: 100%/21  | Total:  8h 48m | Avg: 25m 09s | Max: 50m 37s | Hits:  71%/24717 
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 10m 40s | Avg: 10m 40s | Max: 10m 40s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 10m 40s | Avg: 10m 40s | Max: 10m 40s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 10m 40s | Avg: 10m 40s | Max: 10m 40s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 10m 40s | Avg: 10m 40s | Max: 10m 40s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 10m 40s | Avg: 10m 40s | Max: 10m 40s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 10m 40s | Avg: 10m 40s | Max: 10m 40s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 10m 40s | Avg: 10m 40s | Max: 10m 40s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 10m 40s | Avg: 10m 40s | Max: 10m 40s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 10m 40s | Avg: 10m 40s | Max: 10m 40s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 250)

# Runner
178 linux-amd64-cpu16
41 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

@bernhardmgruber bernhardmgruber merged commit 6dfc8dd into NVIDIA:main Jul 30, 2024
267 checks passed
@bernhardmgruber bernhardmgruber deleted the blockload_ref_no_comment branch July 30, 2024 07:54
pciolkosz pushed a commit to pciolkosz/cccl that referenced this pull request Aug 4, 2024
pciolkosz pushed a commit to pciolkosz/cccl that referenced this pull request Aug 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cub For all items related to CUB
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants