-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Atomics codegen refactor #1993
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a brief look and here are some minor comments. They apply to many places.
🟨 CI finished in 20h 46m: Pass: 96%/420 | Total: 2d 07h | Avg: 7m 54s | Max: 29m 03s | Hits: 95%/522595
|
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
🏃 Runner counts (total jobs: 420)
# | Runner |
---|---|
305 | linux-amd64-cpu16 |
64 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
23 | windows-amd64-cpu16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just looked through the CMake side of things. Looks good, only a minor suggestion.
Co-authored-by: Allison Piper <[email protected]>
Co-authored-by: Bernhard Manfred Gruber <[email protected]>
🟨 CI finished in 2h 33m: Pass: 96%/420 | Total: 2d 07h | Avg: 7m 56s | Max: 31m 03s | Hits: 87%/522583
|
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
🏃 Runner counts (total jobs: 420)
# | Runner |
---|---|
305 | linux-amd64-cpu16 |
64 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
23 | windows-amd64-cpu16 |
🟨 CI finished in 14h 27m: Pass: 98%/420 | Total: 2d 09h | Avg: 8m 11s | Max: 1h 01m | Hits: 90%/522540
|
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
🏃 Runner counts (total jobs: 420)
# | Runner |
---|---|
305 | linux-amd64-cpu16 |
64 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
23 | windows-amd64-cpu16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must have hit the wrong button because I do not remember having reviewed this properly today
🟨 CI finished in 4h 55m: Pass: 99%/421 | Total: 2d 18h | Avg: 9m 30s | Max: 1h 05m | Hits: 87%/522353
|
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 421)
# | Runner |
---|---|
305 | linux-amd64-cpu16 |
65 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
23 | windows-amd64-cpu16 |
🟨 CI finished in 8h 00m: Pass: 97%/421 | Total: 2d 14h | Avg: 8m 54s | Max: 43m 05s | Hits: 90%/518363
|
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 421)
# | Runner |
---|---|
305 | linux-amd64-cpu16 |
65 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
23 | windows-amd64-cpu16 |
// __ptr, __expected, __desired, __weak, __success_memorder, __failure_memorder, _Sco{}); | ||
// } | ||
|
||
// template <typename _Tp, typename _Sco> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this all commented out again?
🟩 CI finished in 4h 17m: Pass: 100%/421 | Total: 7d 08h | Avg: 25m 10s | Max: 1h 37m | Hits: 80%/526531
|
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 421)
# | Runner |
---|---|
305 | linux-amd64-cpu16 |
65 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
23 | windows-amd64-cpu16 |
🟨 CI finished in 3h 49m: Pass: 99%/421 | Total: 2d 09h | Avg: 8m 15s | Max: 1h 15m | Hits: 97%/31304
|
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 421)
# | Runner |
---|---|
305 | linux-amd64-cpu16 |
65 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
23 | windows-amd64-cpu16 |
🟩 CI finished in 4h 55m: Pass: 100%/421 | Total: 2d 09h | Avg: 8m 13s | Max: 1h 15m | Hits: 97%/31304
|
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 421)
# | Runner |
---|---|
305 | linux-amd64-cpu16 |
65 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
23 | windows-amd64-cpu16 |
* Initial draft of new atomics backend * Change atomic fetch ops back to tag dispatch * Save wip * Add load/store and support for MMIO * Begin working on exch * Enable formatting exchange * Several signed-ness fixes * Make atomics ptx tests build. Lit tests are a WIP. * Fix load/store, some volatileness, and min/max * Formatting and enabled codegen in all builds * Make integral.pass.cpp pass * Make the rest of the atomics tests pass * Use 128b ld/st instead of vector load as it is not atomic across the whole atom * Fix copy-paste mistake in load/store * Whitespace fixup * Fix 128b .exch using .cas operands * Make codegen link fmt as PRIVATE Co-authored-by: Allison Piper <[email protected]> * Simplify MMIO down to a static array. Co-authored-by: Bernhard Manfred Gruber <[email protected]> * Static -> Inline for codegen functions. Replace endl with '\n'. * Supply the output stream directly to `fmt::format` * Update fmtlib. * Revert `fmt::format(out...)` changes. They don't work on MSVC. * Fixup libcudacxx codegen CMake stuff * Remove sneaky cstdef include that was auto-added * [pre-commit.ci] auto code formatting --------- Co-authored-by: Allison Piper <[email protected]> Co-authored-by: Bernhard Manfred Gruber <[email protected]> Co-authored-by: Michael Schellenberger Costa <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Description
This PR primarily changes the codegen for libcudacxx's atomics backend.
PTX is mostly unchanged, there are more intrinsics available and many may not even be used. For every intrinsic there is a unifying interface that dispatches to appropriate scopes and intrinsic sizes.
The codegen itself is simpler to modify and I've added support for emitting 128b and MMIO system atomics.
Left for future PRs:
Expose larger types with the atomic APIs and testing.
Expose smaller types using 16b CAS ops.
Add MMIO support to
volatile T
types.Fix local scope atomics.
Checklist