-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate CUDASTF -> CudaX #2572
Conversation
/ok to test |
8 similar comments
/ok to test |
/ok to test |
/ok to test |
/ok to test |
/ok to test |
/ok to test |
/ok to test |
/ok to test |
|
||
#pragma once | ||
|
||
#include <cuda/experimental/__stf/allocators/block_allocator.cuh> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We try and include everything we need and avoid transitive includes. That makes it much simpler to work with a large project later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you say transitive includes; that's only for these few headers ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that every header should include everything it needs
cudax/include/cuda/experimental/__stf/utility/cuda_attributes.cuh
Outdated
Show resolved
Hide resolved
/ok to test |
16 similar comments
/ok to test |
/ok to test |
/ok to test |
/ok to test |
/ok to test |
/ok to test |
/ok to test |
/ok to test |
/ok to test |
/ok to test |
/ok to test |
/ok to test |
/ok to test |
/ok to test |
/ok to test |
/ok to test |
🟨 CI finished in 32m 19s: Pass: 73%/52 | Total: 10h 33m | Avg: 12m 11s | Max: 19m 25s
|
Project | |
---|---|
+/- | CCCL Infrastructure |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda | |
CCCL C Parallel Library |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
+/- | CCCL C Parallel Library |
🏃 Runner counts (total jobs: 52)
# | Runner |
---|---|
41 | linux-amd64-cpu16 |
5 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
/ok to test |
pre-commit.ci autofix |
/ok to test |
/ok to test |
🟩 CI finished in 1h 10m: Pass: 100%/372 | Total: 1d 16h | Avg: 6m 31s | Max: 39m 38s | Hits: 94%/27969
|
Project | |
---|---|
+/- | CCCL Infrastructure |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda | |
CCCL C Parallel Library |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
+/- | CCCL C Parallel Library |
🏃 Runner counts (total jobs: 372)
# | Runner |
---|---|
298 | linux-amd64-cpu16 |
31 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
pre-commit.ci autofix |
/ok to test |
🟩 CI finished in 1h 07m: Pass: 100%/372 | Total: 2d 02h | Avg: 8m 03s | Max: 59m 34s | Hits: 99%/27969
|
Project | |
---|---|
+/- | CCCL Infrastructure |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda | |
CCCL C Parallel Library |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
+/- | CCCL C Parallel Library |
🏃 Runner counts (total jobs: 372)
# | Runner |
---|---|
298 | linux-amd64-cpu16 |
31 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
CUDASTF is an implementation of the Sequential Task Flow model for CUDA. The availability of parallelism within modern hardware has dramatically increased, with large nodes now featuring multiple accelerators. As a result, maximizing concurrency at the application level in a scalable manner has become a crucial priority. To effectively hide latencies, it is essential to achieve the highest level of asynchrony possible. CUDASTF introduces a tasking model that automates data transfers while enforcing implicit data-driven dependencies. Implemented as a header-only C++ library, CUDASTF builds on top of CUDA APIs to simplify the development of multi-GPU applications. CUDASTF is currently capable of generating parallel applications using either the CUDA stream API or the CUDA graph API. --------- Co-authored-by: Cédric Augonnet <[email protected]> Co-authored-by: Andrei Alexandrescu <[email protected]>
CUDASTF is an implementation of the Sequential Task Flow model for CUDA. The availability of parallelism within modern hardware has dramatically increased, with large nodes now featuring multiple accelerators. As a result, maximizing concurrency at the application level in a scalable manner has become a crucial priority. To effectively hide latencies, it is essential to achieve the highest level of asynchrony possible. CUDASTF introduces a tasking model that automates data transfers while enforcing implicit data-driven dependencies. Implemented as a header-only C++ library, CUDASTF builds on top of CUDA APIs to simplify the development of multi-GPU applications. CUDASTF is currently capable of generating parallel applications using either the CUDA stream API or the CUDA graph API. --------- Co-authored-by: Cédric Augonnet <[email protected]> Co-authored-by: Andrei Alexandrescu <[email protected]>
Description
closes
Checklist