Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a test for Thrust scan with non-commutative op #2024

Merged
merged 2 commits into from
Jul 25, 2024

Conversation

bernhardmgruber
Copy link
Contributor

Thrust's scan algorithms do not require a commutative scan operation, so here is a test for that.

@bernhardmgruber bernhardmgruber added the thrust For all items related to Thrust. label Jul 23, 2024
@bernhardmgruber bernhardmgruber marked this pull request as ready for review July 23, 2024 10:43
@bernhardmgruber bernhardmgruber requested review from a team as code owners July 23, 2024 10:43
@bernhardmgruber bernhardmgruber force-pushed the scan_minus branch 2 times, most recently from a73398e to 3ab3dee Compare July 23, 2024 10:45
@elstehle
Copy link
Collaborator

Thanks for adding a test for a non-commutative operation!

However, we rely on the operation being associative and subtraction is not associative. I think something like composition should work. I will try follow up with a suggestion shortly.

@bernhardmgruber
Copy link
Contributor Author

Thanks for adding a test for a non-commutative operation!

However, we rely on the operation being associative and subtraction is not associative. I think something like composition should work. I will try follow up with a suggestion shortly.

Thanks for publicly shaming me :D You are totally right! I will revise this PR.

@bernhardmgruber bernhardmgruber marked this pull request as draft July 23, 2024 13:29
@bernhardmgruber bernhardmgruber requested review from elstehle and removed request for elstehle July 23, 2024 22:33
@bernhardmgruber
Copy link
Contributor Author

@elstehle I implemented composition of permutations as example for a non-commutative, but associative operation. Thank you for this great idea!

@bernhardmgruber bernhardmgruber marked this pull request as ready for review July 23, 2024 22:34
Copy link
Collaborator

@elstehle elstehle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks great! Thanks for the work! 👍

Copy link
Contributor

🟩 CI finished in 5h 04m: Pass: 100%/250 | Total: 3d 02h | Avg: 17m 54s | Max: 1h 08m | Hits: 76%/250036
  • 🟩 cub: Pass: 100%/131 | Total: 19h 48m | Avg: 9m 04s | Max: 52m 03s | Hits: 99%/111124

    🟩 cpu
      🟩 amd64              Pass: 100%/123 | Total: 19h 13m | Avg:  9m 22s | Max: 52m 03s | Hits:  99%/104188
      🟩 arm64              Pass: 100%/8   | Total: 35m 38s | Avg:  4m 27s | Max:  5m 11s | Hits:  99%/6936  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  1h 04m | Avg:  4m 17s | Max: 13m 20s | Hits:  99%/11792 
      🟩 11.8               Pass: 100%/3   | Total: 14m 46s | Avg:  4m 55s | Max:  5m 16s | Hits:  99%/2601  
      🟩 12.5               Pass: 100%/113 | Total: 18h 29m | Avg:  9m 49s | Max: 52m 03s | Hits:  99%/96731 
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total:  7m 36s | Avg:  3m 48s | Max:  3m 48s | Hits: 100%/1436  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  1h 04m | Avg:  4m 17s | Max: 13m 20s | Hits:  99%/11792 
      🟩 nvcc11.8           Pass: 100%/3   | Total: 14m 46s | Avg:  4m 55s | Max:  5m 16s | Hits:  99%/2601  
      🟩 nvcc12.5           Pass: 100%/111 | Total: 18h 22m | Avg:  9m 55s | Max: 52m 03s | Hits:  99%/95295 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  7m 36s | Avg:  3m 48s | Max:  3m 48s | Hits: 100%/1436  
      🟩 nvcc               Pass: 100%/129 | Total: 19h 41m | Avg:  9m 09s | Max: 52m 03s | Hits:  99%/109688
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total: 26m 39s | Avg:  4m 26s | Max:  5m 20s | Hits: 100%/4980  
      🟩 Clang10            Pass: 100%/3   | Total: 15m 46s | Avg:  5m 15s | Max:  5m 25s | Hits: 100%/2607  
      🟩 Clang11            Pass: 100%/4   | Total: 17m 44s | Avg:  4m 26s | Max:  4m 49s | Hits: 100%/3476  
      🟩 Clang12            Pass: 100%/4   | Total: 18m 02s | Avg:  4m 30s | Max:  4m 59s | Hits: 100%/3476  
      🟩 Clang13            Pass: 100%/4   | Total: 18m 44s | Avg:  4m 41s | Max:  4m 52s | Hits: 100%/3476  
      🟩 Clang14            Pass: 100%/4   | Total: 18m 32s | Avg:  4m 38s | Max:  4m 44s | Hits: 100%/3476  
      🟩 Clang15            Pass: 100%/4   | Total: 18m 15s | Avg:  4m 33s | Max:  4m 44s | Hits: 100%/3468  
      🟩 Clang16            Pass: 100%/4   | Total: 18m 11s | Avg:  4m 32s | Max:  4m 44s | Hits: 100%/3468  
      🟩 Clang17            Pass: 100%/26  | Total:  6h 14m | Avg: 14m 23s | Max: 28m 08s | Hits:  99%/22244 
      🟩 GCC6               Pass: 100%/2   | Total:  7m 19s | Avg:  3m 39s | Max:  3m 41s | Hits:  99%/1582  
      🟩 GCC7               Pass: 100%/6   | Total: 23m 11s | Avg:  3m 51s | Max:  4m 13s | Hits:  99%/4983  
      🟩 GCC8               Pass: 100%/6   | Total: 23m 12s | Avg:  3m 52s | Max:  4m 24s | Hits:  99%/4983  
      🟩 GCC9               Pass: 100%/6   | Total: 23m 55s | Avg:  3m 59s | Max:  4m 35s | Hits:  99%/4983  
      🟩 GCC10              Pass: 100%/4   | Total: 18m 35s | Avg:  4m 38s | Max:  4m 56s | Hits:  99%/3476  
      🟩 GCC11              Pass: 100%/7   | Total: 33m 00s | Avg:  4m 42s | Max:  5m 16s | Hits:  99%/6069  
      🟩 GCC12              Pass: 100%/4   | Total: 19m 21s | Avg:  4m 50s | Max:  5m 15s | Hits:  99%/3468  
      🟩 GCC13              Pass: 100%/28  | Total:  7h 12m | Avg: 15m 26s | Max: 52m 03s | Hits:  99%/24276 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 15m 50s | Avg:  5m 16s | Max:  5m 31s | Hits: 100%/2379  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 13m 20s | Avg: 13m 20s | Max: 13m 20s | Hits:  99%/709   
      🟩 MSVC14.29          Pass: 100%/2   | Total: 19m 43s | Avg:  9m 51s | Max: 10m 12s | Hits:  99%/1418  
      🟩 MSVC14.39          Pass: 100%/3   | Total: 33m 02s | Avg: 11m 00s | Max: 11m 31s | Hits:  99%/2127  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/59  | Total:  8h 46m | Avg:  8m 55s | Max: 28m 08s | Hits:  99%/50671 
      🟩 GCC                Pass: 100%/63  | Total:  9h 40m | Avg:  9m 13s | Max: 52m 03s | Hits:  99%/53820 
      🟩 Intel              Pass: 100%/3   | Total: 15m 50s | Avg:  5m 16s | Max:  5m 31s | Hits: 100%/2379  
      🟩 MSVC               Pass: 100%/6   | Total:  1h 06m | Avg: 11m 00s | Max: 13m 20s | Hits:  99%/4254  
    🟩 gpu
      🟩 v100               Pass: 100%/131 | Total: 19h 48m | Avg:  9m 04s | Max: 52m 03s | Hits:  99%/111124
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  7h 57m | Avg:  4m 49s | Max: 13m 20s | Hits:  99%/83380 
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 36m | Avg: 19m 32s | Max: 21m 35s | Hits:  99%/6936  
      🟩 GraphCapture       Pass: 100%/8   | Total:  2h 15m | Avg: 16m 56s | Max: 18m 22s | Hits:  99%/6936  
      🟩 HostLaunch         Pass: 100%/8   | Total:  2h 58m | Avg: 22m 18s | Max: 52m 03s | Hits:  99%/6936  
      🟩 TestGPU            Pass: 100%/8   | Total:  4h 00m | Avg: 30m 04s | Max: 46m 51s | Hits:  99%/6936  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 14m 46s | Avg:  4m 55s | Max:  5m 16s | Hits:  99%/2601  
      🟩 90a                Pass: 100%/4   | Total: 14m 34s | Avg:  3m 38s | Max:  3m 45s | Hits:  99%/3468  
    🟩 std
      🟩 11                 Pass: 100%/34  | Total:  4h 22m | Avg:  7m 42s | Max: 26m 01s | Hits:  99%/29047 
      🟩 14                 Pass: 100%/37  | Total:  5h 05m | Avg:  8m 15s | Max: 27m 44s | Hits:  99%/31174 
      🟩 17                 Pass: 100%/36  | Total:  5h 26m | Avg:  9m 04s | Max: 46m 51s | Hits:  99%/30392 
      🟩 20                 Pass: 100%/24  | Total:  4h 54m | Avg: 12m 15s | Max: 52m 03s | Hits:  99%/20511 
    
  • 🟩 thrust: Pass: 100%/118 | Total: 2d 06h | Avg: 27m 45s | Max: 1h 08m | Hits: 58%/138912

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total:  2d 02h | Avg: 27m 37s | Max:  1h 08m | Hits:  58%/129492
      🟩 arm64              Pass: 100%/8   | Total:  3h 56m | Avg: 29m 34s | Max: 31m 39s | Hits:  48%/9420  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  6h 57m | Avg: 27m 51s | Max: 50m 14s | Hits:  48%/17660 
      🟩 11.8               Pass: 100%/3   | Total:  1h 56m | Avg: 38m 43s | Max: 41m 15s | Hits:  49%/3534  
      🟩 12.5               Pass: 100%/100 | Total:  1d 21h | Avg: 27m 24s | Max:  1h 08m | Hits:  59%/117718
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 58m 12s | Avg: 29m 06s | Max: 29m 30s | Hits:  48%/2354  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  6h 57m | Avg: 27m 51s | Max: 50m 14s | Hits:  48%/17660 
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 56m | Avg: 38m 43s | Max: 41m 15s | Hits:  49%/3534  
      🟩 nvcc12.5           Pass: 100%/98  | Total:  1d 20h | Avg: 27m 22s | Max:  1h 08m | Hits:  60%/115364
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 58m 12s | Avg: 29m 06s | Max: 29m 30s | Hits:  48%/2354  
      🟩 nvcc               Pass: 100%/116 | Total:  2d 05h | Avg: 27m 43s | Max:  1h 08m | Hits:  58%/136558
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  2h 47m | Avg: 27m 52s | Max: 34m 33s | Hits:  48%/7062  
      🟩 Clang10            Pass: 100%/3   | Total:  1h 27m | Avg: 29m 02s | Max: 31m 05s | Hits:  48%/3531  
      🟩 Clang11            Pass: 100%/4   | Total:  1h 59m | Avg: 29m 52s | Max: 31m 54s | Hits:  48%/4708  
      🟩 Clang12            Pass: 100%/4   | Total:  1h 56m | Avg: 29m 14s | Max: 31m 21s | Hits:  48%/4708  
      🟩 Clang13            Pass: 100%/4   | Total:  2h 03m | Avg: 30m 53s | Max: 32m 50s | Hits:  48%/4708  
      🟩 Clang14            Pass: 100%/4   | Total:  2h 05m | Avg: 31m 27s | Max: 35m 02s | Hits:  48%/4708  
      🟩 Clang15            Pass: 100%/4   | Total:  1h 57m | Avg: 29m 17s | Max: 31m 47s | Hits:  48%/4708  
      🟩 Clang16            Pass: 100%/4   | Total:  2h 07m | Avg: 31m 45s | Max: 37m 24s | Hits:  48%/4708  
      🟩 Clang17            Pass: 100%/18  | Total:  5h 56m | Avg: 19m 47s | Max: 32m 20s | Hits:  74%/21186 
      🟩 GCC6               Pass: 100%/2   | Total: 50m 20s | Avg: 25m 10s | Max: 27m 33s | Hits:  49%/2354  
      🟩 GCC7               Pass: 100%/6   | Total:  2h 43m | Avg: 27m 17s | Max: 30m 15s | Hits:  48%/7068  
      🟩 GCC8               Pass: 100%/6   | Total:  2h 48m | Avg: 28m 00s | Max: 32m 14s | Hits:  48%/7068  
      🟩 GCC9               Pass: 100%/6   | Total:  2h 51m | Avg: 28m 35s | Max: 32m 10s | Hits:  48%/7068  
      🟩 GCC10              Pass: 100%/4   | Total:  2h 01m | Avg: 30m 19s | Max: 33m 00s | Hits:  48%/4712  
      🟩 GCC11              Pass: 100%/7   | Total:  3h 57m | Avg: 33m 56s | Max: 41m 15s | Hits:  49%/8246  
      🟩 GCC12              Pass: 100%/4   | Total:  2h 07m | Avg: 31m 50s | Max: 34m 41s | Hits:  48%/4712  
      🟩 GCC13              Pass: 100%/20  | Total:  6h 31m | Avg: 19m 35s | Max: 31m 39s | Hits:  74%/23560 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 53m | Avg: 37m 52s | Max: 41m 18s | Hits:  48%/3540  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 50m 14s | Avg: 50m 14s | Max: 50m 14s | Hits:  46%/1173  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 45m | Avg: 52m 59s | Max: 55m 18s | Hits:  46%/2346  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  3h 53m | Avg: 38m 50s | Max:  1h 08m | Hits:  72%/7038  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/51  | Total: 22h 20m | Avg: 26m 17s | Max: 37m 24s | Hits:  57%/60027 
      🟩 GCC                Pass: 100%/55  | Total: 23h 51m | Avg: 26m 01s | Max: 41m 15s | Hits:  58%/64788 
      🟩 Intel              Pass: 100%/3   | Total:  1h 53m | Avg: 37m 52s | Max: 41m 18s | Hits:  48%/3540  
      🟩 MSVC               Pass: 100%/9   | Total:  6h 29m | Avg: 43m 14s | Max:  1h 08m | Hits:  64%/10557 
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total:  2d 06h | Avg: 27m 45s | Max:  1h 08m | Hits:  58%/138912
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  2d 02h | Avg: 30m 49s | Max:  1h 08m | Hits:  50%/116553
      🟩 TestCPU            Pass: 100%/11  | Total:  1h 46m | Avg:  9m 40s | Max: 20m 40s | Hits:  99%/12939 
      🟩 TestGPU            Pass: 100%/8   | Total:  1h 57m | Avg: 14m 42s | Max: 20m 02s | Hits:  99%/9420  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 56m | Avg: 38m 43s | Max: 41m 15s | Hits:  49%/3534  
      🟩 90a                Pass: 100%/4   | Total:  1h 16m | Avg: 19m 06s | Max: 20m 03s | Hits:  48%/4712  
    🟩 std
      🟩 11                 Pass: 100%/30  | Total: 11h 58m | Avg: 23m 57s | Max: 35m 00s | Hits:  58%/35328 
      🟩 14                 Pass: 100%/34  | Total: 16h 18m | Avg: 28m 46s | Max: 52m 51s | Hits:  58%/40020 
      🟩 17                 Pass: 100%/33  | Total: 16h 19m | Avg: 29m 40s | Max: 55m 44s | Hits:  55%/38847 
      🟩 20                 Pass: 100%/21  | Total:  9h 58m | Avg: 28m 30s | Max:  1h 08m | Hits:  61%/24717 
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 12m 13s | Avg: 12m 13s | Max: 12m 13s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 12m 13s | Avg: 12m 13s | Max: 12m 13s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 12m 13s | Avg: 12m 13s | Max: 12m 13s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 12m 13s | Avg: 12m 13s | Max: 12m 13s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 12m 13s | Avg: 12m 13s | Max: 12m 13s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 12m 13s | Avg: 12m 13s | Max: 12m 13s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 12m 13s | Avg: 12m 13s | Max: 12m 13s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 12m 13s | Avg: 12m 13s | Max: 12m 13s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 12m 13s | Avg: 12m 13s | Max: 12m 13s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
+/- Thrust
CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 250)

# Runner
178 linux-amd64-cpu16
41 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

@bernhardmgruber
Copy link
Contributor Author

@elstehle please have another look! I had to fix something in the Thrust testing framework, since when it tried to print a permutation_t, it failed to compile before C++17.

Copy link
Collaborator

@elstehle elstehle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for fixing printing in the assertion!

@bernhardmgruber bernhardmgruber merged commit 5ba23b6 into NVIDIA:main Jul 25, 2024
261 of 264 checks passed
@bernhardmgruber bernhardmgruber deleted the scan_minus branch July 25, 2024 08:25
pciolkosz pushed a commit to pciolkosz/cccl that referenced this pull request Aug 4, 2024
* Add a test for Thrust scan with non-commutative op
* Fix printing mismatching sequences of non-addable types before C++17 in Thrust unit tests
pciolkosz pushed a commit to pciolkosz/cccl that referenced this pull request Aug 4, 2024
* Add a test for Thrust scan with non-commutative op
* Fix printing mismatching sequences of non-addable types before C++17 in Thrust unit tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
thrust For all items related to Thrust.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants