Expand and add more CUDA/HIP documentation #1309

msimberg · 2024-11-01T12:13:24Z

Adds documentation for classes that weren't yet documented and updates the existing documentation. Adds a section for pika/cuda.hpp to the API documentation.

Early stages, far from complete, but may require some discussion before continuing.

codacy-production · 2024-11-01T12:18:22Z

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
✅ -0.01% (target: -1.00%)	✅ ∅ (target: 90.00%)

Coverage variation details

	Coverable lines	Covered lines	Coverage
Common ancestor commit (`2df06f7`)	18285	13814	75.55%
Head commit (`0ed5c9d`)	18274 (-11)	13803 (-11)	75.53% (-0.01%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details

	Coverable lines	Covered lines	Diff coverage
Pull request (#1309)	0	0	∅ (not applicable)

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings Change summary preferences

_{Codacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more}

docs/api.rst

msimberg · 2024-11-01T12:24:31Z

docs/api.rst

+      :language: c++
+      :start-at: #include
+
+.. doxygenvariable:: pika::cuda::experimental::then_with_stream


This one is a bit weirdly documented at the moment, since we document the variable, but not the call operators of the then_with_stream_t class. I'm between having an example, which is usually clear enough on its own, and just documenting the call operators explicitly. Currently I refer e.g. to \p f as the callable passed to the adaptor, but the user has no information on how that is passed to the adaptor (not much of a problem for then_with_stream but a bigger problem for then_with_cublas since that also takes a pointer mode).

Should I include both an example and document the call operators? If we document the call operators should we do the same for drop_value, drop_operation_state, etc.?

Explore separately from this PR what documentation would look like with e.g. documentation for call operator on CPO type.

I ended up trying to document the types of the CPOs as well, and ended up with something that looks like this:

i.e. first type, then CPO itself, and then the example. I tried to keep it quite short, referring to the CPO documentation for more details. The important part for me was to get the signature of the CPO visible, so I've documented the two operator()s and their parameters.

@albestro, @aurianer, @rasolca, @RMeli thoughts?

examples/documentation/CMakeLists.txt

libs/pika/async_cuda/include/pika/async_cuda/cuda_pool.hpp

libs/pika/async_cuda_base/include/pika/async_cuda_base/cublas_handle.hpp

libs/pika/async_cuda/include/pika/async_cuda/then_on_host.hpp

libs/pika/async_cuda/include/pika/async_cuda/then_with_stream.hpp

libs/pika/async_cuda_base/include/pika/async_cuda_base/cuda_stream.hpp

docs/api.rst

msimberg · 2024-11-07T14:45:57Z

I welcome more feedback on this PR now. I feel like some of the examples I've created are a bit forced/artificial in the name of showing off what's possible with the adaptors, so if you have suggestions on how to improve change them that's particularly welcome. I've also now documented the types of the CUDA/HIP sender adaptors so that we can document the parameters of them. I think it looks decent enough, but interested in hearing feedback from others.

examples/documentation/cuda_overview_documentation.cu

libs/pika/async_cuda/include/pika/async_cuda/then_with_stream.hpp

libs/pika/async_cuda_base/include/pika/async_cuda_base/cublas_handle.hpp

libs/pika/async_cuda_base/include/pika/async_cuda_base/cusolver_handle.hpp

libs/pika/async_cuda/include/pika/async_cuda/then_with_stream.hpp

docs/api.rst

examples/documentation/then_with_cublas_documentation.cu

RMeli · 2024-11-14T14:26:49Z

libs/pika/async_cuda/include/pika/async_cuda/cuda_pool.hpp

+        /// \param device the CUDA device used for scheduling work
+        /// \param num_normal_priority_streams_per_thread the number of normal priority streams per
+        /// thread
+        /// \param num_high_priority_streams_per_thread the number of high priority streams per
+        /// thread
+        /// \param flags flags used to construct CUDA streams
+        /// \param num_cublas_handles the number of cuBLAS handles to create for the whole pool
+        /// \param num_cusolver_handles the number of cuSOLVER handles to create for the whole pool


Add a short comment on how the default values have been selected?

I'll try 🙃 They are "arbitrary"/decent average value that seem to not perform too badly with DLA-Future.

I think that would be enough. It's good for people interested in pika but without any knowledge of DLA-Future to know these values are based on a production project.

I've updated the documentation for the parameters in f267193. I'm anticipating #1294 with the documentation here. I'm still documenting the per-thread parameters here, but I'd like to get #1294 merged first.

Do you find the warnings sufficient? We're essentially making a silent API change with #1294, and I'm worried that it's too silent (especially if one doesn't read documentation or release notes)... The main alternative I can think of is to make the parameters strong types instead of plain integers, thus requiring a explicit conversion at the callsite.

The new documentation looks great, thank you.

The silent change is quite disruptive indeed, and I see why you are worried it is a bit too silent. However, I think it is probably enough combined with the release notes. Mostly because pika is still in 0.x and semantic version explicitly states that anything can change, and I don't see many issues from external users in the repository (so usage outside of DLA-Future might still be reasonably low).

The main alternative I can think of is to make the parameters strong types instead of plain integers, thus requiring a explicit conversion at the callsite.

I did not look at #1294, but another alternative would be to have a flag/option allowing to switch between old and new behavior, and have the old behavior by default together with a deprecation warning. However, it might be a PITA to implement and too overkill for the reasons above.

Ok, thanks, good to hear you're not too concerned. This kind of stuff is indeed why we've stuck to 0.X and semver, so I suppose I should exercise that right... it's just an uneasy feeling knowing that everything will compile just fine, but possibly perform differently.

too overkill for the reasons above.

Yeah, I'd like to avoid that if possible.

However, I am planning on having a warning about this in DLA-Future (and basically introducing a new configuration option), so there the change should definitely not be silent.

libs/pika/async_cuda_base/include/pika/async_cuda_base/cuda_stream.hpp

Document cuda_pool, cuda_scheduler, cuda_stream, cublas_handle, cusolver_handle, as well as expose these with CUDA sender adaptors in the documentation. Adds a high-level example of using CUDA functionality.

…n documentation

…usive

msimberg self-assigned this Nov 1, 2024