-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUB - Enable DPX Reduction #2286
Conversation
thanks @bernhardmgruber @miscco @mfbalin for all your suggestions! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle, LGTM. Here are some more suggestions:
/ok to test |
/ok to test |
@gevtushenko @bernhardmgruber @miscco I practically rewrote |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is some feedback:
#define _CUB_TEMPLATE_REQUIRES(...) ::cuda::std::__enable_if_t<(__VA_ARGS__)>* = nullptr | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: This feels generic enough to be part of libcu++ and be named _CCCL_TEMPLATE_REQUIRES
. @miscco maybe this is already covered by your concept emulation?
/ok to test |
/ok to test |
/ok to test |
🟨 CI finished in 4h 44m: Pass: 83%/251 | Total: 6d 07h | Avg: 36m 16s | Max: 1h 12m | Hits: 69%/24387
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental | |
pycuda | |
CUDA C Core Library |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | pycuda |
+/- | CUDA C Core Library |
🏃 Runner counts (total jobs: 251)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
42 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
🟨 CI finished in 7h 14m: Pass: 83%/251 | Total: 6d 08h | Avg: 36m 26s | Max: 1h 12m | Hits: 69%/24387
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental | |
pycuda | |
CUDA C Core Library |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | pycuda |
+/- | CUDA C Core Library |
🏃 Runner counts (total jobs: 251)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
42 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
/ok to test |
/ok to test |
🟩 CI finished in 6h 28m: Pass: 100%/251 | Total: 6d 16h | Avg: 38m 22s | Max: 1h 36m | Hits: 68%/24387
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental | |
pycuda | |
CUDA C Core Library |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | pycuda |
+/- | CUDA C Core Library |
🏃 Runner counts (total jobs: 251)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
42 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
Address #2032
Enable DPX SIMD comparison (min/max) instructions for Hopper+ architectures for
uint16_t
andint16_t
data typesAdditional optimizations, not strictly relatated to DPX, and overloading clean up will be part of another issue/PR
NVIDIA H100 80GB HBM3
NVIDIA H200