-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup CUB block/thread load and exchange #1946
Cleanup CUB block/thread load and exchange #1946
Conversation
template <int DUMMY> | ||
struct LoadInternal<BLOCK_LOAD_DIRECT, DUMMY> | ||
{ | ||
/// Shared memory storage layout type | ||
using TempStorage = NullType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting from here, I deleted all the documentation blocks of the LoadInternal
specializations, because they are private
inside of BlockLoad
and named Internal
. Furthermore, their behavior is amply documented in the BlockLoadAlgorithm
enumeration. Also, the comments were highly redundant and were at least one time wrong.
28b6778
to
1808103
Compare
🟩 CI finished in 2h 56m: Pass: 100%/249 | Total: 4d 20h | Avg: 28m 03s | Max: 52m 27s | Hits: 61%/248564
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
🏃 Runner counts (total jobs: 249)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
40 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
1808103
to
01d5967
Compare
🟩 CI finished in 3h 02m: Pass: 100%/250 | Total: 4d 19h | Avg: 27m 39s | Max: 50m 37s | Hits: 57%/248341
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental | |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | pycuda |
🏃 Runner counts (total jobs: 250)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
41 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
Since I had to read into CUB's block/thread load and block exchange, here are a few improvements.
Let's also check for any SASS differences, since this code is at the heart of all CUB algorithms:
cub.test.block_load.it_11
changed. - Identical before and after this PR.