Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug remaining on thrust::inclusive_scan with init value with CDP #2346

Merged
merged 1 commit into from
Sep 3, 2024

Conversation

gonidelis
Copy link
Member

@gonidelis gonidelis commented Sep 3, 2024

Fixes bug remaining from #1940 in the thrust::inclusive_scan implementation with initial value that uses CDP. Also adds test, which because omitted, led to the oversight of the bug.

#2326 relies on it.

@gonidelis gonidelis enabled auto-merge (squash) September 3, 2024 03:31
Copy link
Contributor

github-actions bot commented Sep 3, 2024

🟩 CI finished in 2h 50m: Pass: 100%/251 | Total: 6d 00h | Avg: 34m 36s | Max: 1h 07m | Hits: 80%/24375
  • 🟩 cub: Pass: 100%/132 | Total: 3d 21h | Avg: 42m 16s | Max: 1h 07m | Hits: 65%/4296

    🟩 cpu
      🟩 amd64              Pass: 100%/124 | Total:  3d 13h | Avg: 41m 33s | Max:  1h 07m | Hits:  65%/4296  
      🟩 arm64              Pass: 100%/8   | Total:  7h 08m | Avg: 53m 31s | Max: 55m 41s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  9h 52m | Avg: 39m 30s | Max: 52m 05s | Hits:  65%/716   
      🟩 11.8               Pass: 100%/3   | Total:  2h 48m | Avg: 56m 18s | Max: 57m 20s
      🟩 12.5               Pass: 100%/114 | Total:  3d 08h | Avg: 42m 16s | Max:  1h 07m | Hits:  65%/3580  
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 44m 14s | Avg: 22m 07s | Max: 22m 52s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  9h 52m | Avg: 39m 30s | Max: 52m 05s | Hits:  65%/716   
      🟩 nvcc11.8           Pass: 100%/3   | Total:  2h 48m | Avg: 56m 18s | Max: 57m 20s
      🟩 nvcc12.5           Pass: 100%/112 | Total:  3d 07h | Avg: 42m 37s | Max:  1h 07m | Hits:  65%/3580  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 44m 14s | Avg: 22m 07s | Max: 22m 52s
      🟩 nvcc               Pass: 100%/130 | Total:  3d 20h | Avg: 42m 35s | Max:  1h 07m | Hits:  65%/4296  
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  4h 43m | Avg: 47m 17s | Max: 50m 06s
      🟩 Clang10            Pass: 100%/3   | Total:  2h 34m | Avg: 51m 29s | Max: 52m 46s
      🟩 Clang11            Pass: 100%/4   | Total:  3h 21m | Avg: 50m 27s | Max: 54m 11s
      🟩 Clang12            Pass: 100%/4   | Total:  3h 27m | Avg: 51m 53s | Max: 54m 21s
      🟩 Clang13            Pass: 100%/4   | Total:  3h 26m | Avg: 51m 40s | Max: 53m 38s
      🟩 Clang14            Pass: 100%/4   | Total:  3h 21m | Avg: 50m 20s | Max: 52m 58s
      🟩 Clang15            Pass: 100%/4   | Total:  3h 31m | Avg: 52m 55s | Max: 54m 56s
      🟩 Clang16            Pass: 100%/4   | Total:  3h 34m | Avg: 53m 44s | Max: 56m 51s
      🟩 Clang17            Pass: 100%/26  | Total: 13h 23m | Avg: 30m 55s | Max: 54m 53s
      🟩 GCC6               Pass: 100%/2   | Total:  1h 12m | Avg: 36m 18s | Max: 36m 37s
      🟩 GCC7               Pass: 100%/6   | Total:  4h 23m | Avg: 43m 54s | Max: 50m 59s
      🟩 GCC8               Pass: 100%/6   | Total:  4h 33m | Avg: 45m 32s | Max: 55m 17s
      🟩 GCC9               Pass: 100%/6   | Total:  4h 35m | Avg: 45m 53s | Max: 57m 42s
      🟩 GCC10              Pass: 100%/4   | Total:  3h 36m | Avg: 54m 01s | Max: 58m 45s
      🟩 GCC11              Pass: 100%/7   | Total:  6h 12m | Avg: 53m 16s | Max: 57m 20s
      🟩 GCC12              Pass: 100%/4   | Total:  3h 26m | Avg: 51m 33s | Max: 54m 37s
      🟩 GCC13              Pass: 100%/29  | Total: 14h 43m | Avg: 30m 27s | Max: 55m 41s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 50m | Avg: 56m 48s | Max: 59m 43s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 52m 05s | Avg: 52m 05s | Max: 52m 05s | Hits:  65%/716   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 56m | Avg: 58m 29s | Max: 59m 19s | Hits:  65%/1432  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  3h 12m | Avg:  1h 04m | Max:  1h 07m | Hits:  65%/2148  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/59  | Total:  1d 17h | Avg: 42m 08s | Max: 56m 51s
      🟩 GCC                Pass: 100%/64  | Total:  1d 18h | Avg: 40m 02s | Max: 58m 45s
      🟩 Intel              Pass: 100%/3   | Total:  2h 50m | Avg: 56m 48s | Max: 59m 43s
      🟩 MSVC               Pass: 100%/6   | Total:  6h 01m | Avg:  1h 00m | Max:  1h 07m | Hits:  65%/4296  
    🟩 gpu
      🟩 v100               Pass: 100%/132 | Total:  3d 21h | Avg: 42m 16s | Max:  1h 07m | Hits:  65%/4296  
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  3d 09h | Avg: 49m 06s | Max:  1h 07m | Hits:  65%/4296  
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 44m | Avg: 20m 37s | Max: 24m 11s
      🟩 GraphCapture       Pass: 100%/8   | Total:  2h 07m | Avg: 15m 57s | Max: 17m 32s
      🟩 HostLaunch         Pass: 100%/8   | Total:  2h 41m | Avg: 20m 08s | Max: 22m 22s
      🟩 SmallGMem          Pass: 100%/1   | Total: 36m 28s | Avg: 36m 28s | Max: 36m 28s
      🟩 TestGPU            Pass: 100%/8   | Total:  3h 48m | Avg: 28m 34s | Max: 35m 28s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  2h 48m | Avg: 56m 18s | Max: 57m 20s
      🟩 90a                Pass: 100%/4   | Total:  1h 26m | Avg: 21m 42s | Max: 22m 46s
    🟩 std
      🟩 11                 Pass: 100%/34  | Total: 23h 45m | Avg: 41m 55s | Max: 56m 51s
      🟩 14                 Pass: 100%/37  | Total:  1d 02h | Avg: 43m 36s | Max: 59m 01s | Hits:  65%/2148  
      🟩 17                 Pass: 100%/37  | Total:  1d 02h | Avg: 42m 44s | Max:  1h 07m | Hits:  65%/1432  
      🟩 20                 Pass: 100%/24  | Total: 16h 00m | Avg: 40m 01s | Max:  1h 05m | Hits:  65%/716   
    
  • 🟩 thrust: Pass: 100%/118 | Total: 2d 03h | Avg: 26m 14s | Max: 57m 28s | Hits: 83%/20079

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total:  2d 00h | Avg: 26m 11s | Max: 57m 28s | Hits:  83%/20079 
      🟩 arm64              Pass: 100%/8   | Total:  3h 36m | Avg: 27m 02s | Max: 30m 41s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  6h 15m | Avg: 25m 03s | Max: 45m 27s | Hits:  77%/2231  
      🟩 11.8               Pass: 100%/3   | Total:  1h 35m | Avg: 31m 53s | Max: 34m 13s
      🟩 12.5               Pass: 100%/100 | Total:  1d 19h | Avg: 26m 15s | Max: 57m 28s | Hits:  83%/17848 
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 51m 38s | Avg: 25m 49s | Max: 26m 00s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  6h 15m | Avg: 25m 03s | Max: 45m 27s | Hits:  77%/2231  
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 35m | Avg: 31m 53s | Max: 34m 13s
      🟩 nvcc12.5           Pass: 100%/98  | Total:  1d 18h | Avg: 26m 16s | Max: 57m 28s | Hits:  83%/17848 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 51m 38s | Avg: 25m 49s | Max: 26m 00s
      🟩 nvcc               Pass: 100%/116 | Total:  2d 02h | Avg: 26m 15s | Max: 57m 28s | Hits:  83%/20079 
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  2h 36m | Avg: 26m 05s | Max: 30m 50s
      🟩 Clang10            Pass: 100%/3   | Total:  1h 22m | Avg: 27m 33s | Max: 29m 27s
      🟩 Clang11            Pass: 100%/4   | Total:  1h 51m | Avg: 27m 58s | Max: 30m 48s
      🟩 Clang12            Pass: 100%/4   | Total:  1h 47m | Avg: 26m 57s | Max: 29m 49s
      🟩 Clang13            Pass: 100%/4   | Total:  1h 49m | Avg: 27m 19s | Max: 28m 23s
      🟩 Clang14            Pass: 100%/4   | Total:  1h 48m | Avg: 27m 04s | Max: 29m 31s
      🟩 Clang15            Pass: 100%/4   | Total:  1h 50m | Avg: 27m 31s | Max: 31m 46s
      🟩 Clang16            Pass: 100%/4   | Total:  1h 51m | Avg: 27m 58s | Max: 32m 21s
      🟩 Clang17            Pass: 100%/18  | Total:  5h 50m | Avg: 19m 28s | Max: 32m 51s
      🟩 GCC6               Pass: 100%/2   | Total: 42m 40s | Avg: 21m 20s | Max: 24m 45s
      🟩 GCC7               Pass: 100%/6   | Total:  2h 33m | Avg: 25m 30s | Max: 29m 48s
      🟩 GCC8               Pass: 100%/6   | Total:  2h 36m | Avg: 26m 08s | Max: 30m 37s
      🟩 GCC9               Pass: 100%/6   | Total:  2h 39m | Avg: 26m 39s | Max: 33m 59s
      🟩 GCC10              Pass: 100%/4   | Total:  1h 54m | Avg: 28m 36s | Max: 31m 47s
      🟩 GCC11              Pass: 100%/7   | Total:  3h 34m | Avg: 30m 42s | Max: 34m 13s
      🟩 GCC12              Pass: 100%/4   | Total:  2h 04m | Avg: 31m 12s | Max: 35m 19s
      🟩 GCC13              Pass: 100%/20  | Total:  6h 24m | Avg: 19m 12s | Max: 34m 04s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 47m | Avg: 35m 47s | Max: 43m 26s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 45m 27s | Avg: 45m 27s | Max: 45m 27s | Hits:  77%/2231  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 51m | Avg: 55m 30s | Max: 57m 28s | Hits:  74%/4462  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  3h 53m | Avg: 38m 52s | Max: 57m 16s | Hits:  86%/13386 
    🟩 cxx_family
      🟩 Clang              Pass: 100%/51  | Total: 20h 49m | Avg: 24m 29s | Max: 32m 51s
      🟩 GCC                Pass: 100%/55  | Total: 22h 31m | Avg: 24m 33s | Max: 35m 19s
      🟩 Intel              Pass: 100%/3   | Total:  1h 47m | Avg: 35m 47s | Max: 43m 26s
      🟩 MSVC               Pass: 100%/9   | Total:  6h 29m | Avg: 43m 17s | Max: 57m 28s | Hits:  83%/20079 
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total:  2d 03h | Avg: 26m 14s | Max: 57m 28s | Hits:  83%/20079 
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  1d 23h | Avg: 28m 57s | Max: 57m 28s | Hits:  74%/13386 
      🟩 TestCPU            Pass: 100%/11  | Total:  2h 04m | Avg: 11m 21s | Max: 24m 24s | Hits:  99%/6693  
      🟩 TestGPU            Pass: 100%/8   | Total:  1h 45m | Avg: 13m 12s | Max: 16m 38s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 35m | Avg: 31m 53s | Max: 34m 13s
      🟩 90a                Pass: 100%/4   | Total:  1h 04m | Avg: 16m 13s | Max: 18m 21s
    🟩 std
      🟩 11                 Pass: 100%/30  | Total: 10h 33m | Avg: 21m 06s | Max: 28m 57s
      🟩 14                 Pass: 100%/34  | Total: 15h 41m | Avg: 27m 41s | Max: 53m 59s | Hits:  81%/8924  
      🟩 17                 Pass: 100%/33  | Total: 15h 56m | Avg: 28m 59s | Max: 57m 28s | Hits:  82%/6693  
      🟩 20                 Pass: 100%/21  | Total:  9h 25m | Avg: 26m 56s | Max: 57m 16s | Hits:  86%/4462  
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 10m 09s | Avg: 10m 09s | Max: 10m 09s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 10m 09s | Avg: 10m 09s | Max: 10m 09s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 10m 09s | Avg: 10m 09s | Max: 10m 09s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 10m 09s | Avg: 10m 09s | Max: 10m 09s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 10m 09s | Avg: 10m 09s | Max: 10m 09s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 10m 09s | Avg: 10m 09s | Max: 10m 09s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 10m 09s | Avg: 10m 09s | Max: 10m 09s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 10m 09s | Avg: 10m 09s | Max: 10m 09s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 10m 09s | Avg: 10m 09s | Max: 10m 09s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
+/- Thrust
CUDA Experimental
pycuda
CUDA C Core Library

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda
+/- CUDA C Core Library

🏃‍ Runner counts (total jobs: 251)

# Runner
178 linux-amd64-cpu16
42 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

@gonidelis gonidelis merged commit 498251c into NVIDIA:main Sep 3, 2024
268 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants