Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDAX] Add can_peer_access_to API to device_ref and check both ways access in get_peers #2642

Merged
merged 3 commits into from
Oct 30, 2024

Conversation

pciolkosz
Copy link
Contributor

While working on the P2P sample I realized in addition to get_peers() it can be useful to have a lower level directed p2p query. The name I picked, can_peer_access_to, might not be the best, but I think matches what the API does. This API does not query if memory if accessible, but if its possible to enable peers access and what-can-access-to-what order of devices I believe is the correct order here. It is also in line with the order on cudaDeviceCanAccessPeer.

I also changed get_peers() to check peer accessibility both ways just to be safe. The above API can be used in those weird non-symmetrical peer access cases instead.

@pciolkosz pciolkosz requested a review from a team as a code owner October 28, 2024 23:57
Copy link
Contributor

🟩 CI finished in 27m 39s: Pass: 100%/54 | Total: 4h 19m | Avg: 4m 48s | Max: 23m 08s | Hits: 89%/224
  • 🟩 cudax: Pass: 100%/54 | Total: 4h 19m | Avg: 4m 48s | Max: 23m 08s | Hits: 89%/224

    🟩 cpu
      🟩 amd64              Pass: 100%/50  | Total:  4h 09m | Avg:  4m 59s | Max: 23m 08s | Hits:  89%/224   
      🟩 arm64              Pass: 100%/4   | Total: 10m 12s | Avg:  2m 33s | Max:  2m 40s
    🟩 ctk
      🟩 12.0               Pass: 100%/19  | Total:  1h 37m | Avg:  5m 06s | Max: 23m 08s | Hits:  89%/112   
      🟩 12.5               Pass: 100%/2   | Total:  9m 51s | Avg:  4m 55s | Max:  5m 03s
      🟩 12.6               Pass: 100%/33  | Total:  2h 32m | Avg:  4m 37s | Max: 22m 02s | Hits:  89%/112   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/19  | Total:  1h 37m | Avg:  5m 06s | Max: 23m 08s | Hits:  89%/112   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  9m 51s | Avg:  4m 55s | Max:  5m 03s
      🟩 nvcc12.6           Pass: 100%/33  | Total:  2h 32m | Avg:  4m 37s | Max: 22m 02s | Hits:  89%/112   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/54  | Total:  4h 19m | Avg:  4m 48s | Max: 23m 08s | Hits:  89%/224   
    🟩 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  6m 29s | Avg:  3m 14s | Max:  3m 20s
      🟩 Clang10            Pass: 100%/2   | Total:  6m 21s | Avg:  3m 10s | Max:  3m 25s
      🟩 Clang11            Pass: 100%/4   | Total: 12m 29s | Avg:  3m 07s | Max:  3m 21s
      🟩 Clang12            Pass: 100%/4   | Total: 12m 21s | Avg:  3m 05s | Max:  3m 20s
      🟩 Clang13            Pass: 100%/4   | Total: 12m 12s | Avg:  3m 03s | Max:  3m 11s
      🟩 Clang14            Pass: 100%/4   | Total: 30m 10s | Avg:  7m 32s | Max: 20m 42s
      🟩 Clang15            Pass: 100%/2   | Total:  6m 57s | Avg:  3m 28s | Max:  3m 46s
      🟩 Clang16            Pass: 100%/4   | Total: 11m 25s | Avg:  2m 51s | Max:  3m 17s
      🟩 Clang17            Pass: 100%/2   | Total:  7m 16s | Avg:  3m 38s | Max:  3m 39s
      🟩 Clang18            Pass: 100%/2   | Total: 19m 54s | Avg:  9m 57s | Max: 16m 54s
      🟩 GCC9               Pass: 100%/2   | Total:  5m 42s | Avg:  2m 51s | Max:  2m 55s
      🟩 GCC10              Pass: 100%/4   | Total: 11m 48s | Avg:  2m 57s | Max:  3m 06s
      🟩 GCC11              Pass: 100%/4   | Total: 11m 36s | Avg:  2m 54s | Max:  3m 06s
      🟩 GCC12              Pass: 100%/7   | Total:  1h 12m | Avg: 10m 25s | Max: 23m 08s
      🟩 GCC13              Pass: 100%/3   | Total:  8m 06s | Avg:  2m 42s | Max:  2m 51s
      🟩 MSVC14.36          Pass: 100%/1   | Total:  6m 55s | Avg:  6m 55s | Max:  6m 55s | Hits:  89%/112   
      🟩 MSVC14.39          Pass: 100%/1   | Total:  7m 02s | Avg:  7m 02s | Max:  7m 02s | Hits:  89%/112   
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  9m 51s | Avg:  4m 55s | Max:  5m 03s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/30  | Total:  2h 05m | Avg:  4m 11s | Max: 20m 42s
      🟩 GCC                Pass: 100%/20  | Total:  1h 50m | Avg:  5m 30s | Max: 23m 08s
      🟩 MSVC               Pass: 100%/2   | Total: 13m 57s | Avg:  6m 58s | Max:  7m 02s | Hits:  89%/224   
      🟩 NVHPC              Pass: 100%/2   | Total:  9m 51s | Avg:  4m 55s | Max:  5m 03s
    🟩 gpu
      🟩 v100               Pass: 100%/54  | Total:  4h 19m | Avg:  4m 48s | Max: 23m 08s | Hits:  89%/224   
    🟩 jobs
      🟩 Build              Pass: 100%/49  | Total:  2h 41m | Avg:  3m 17s | Max:  7m 02s | Hits:  89%/224   
      🟩 Test               Pass: 100%/5   | Total:  1h 38m | Avg: 19m 39s | Max: 23m 08s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 30s | Avg:  2m 30s | Max:  2m 30s
      🟩 90a                Pass: 100%/1   | Total:  2m 51s | Avg:  2m 51s | Max:  2m 51s
    🟩 std
      🟩 17                 Pass: 100%/29  | Total:  2h 09m | Avg:  4m 26s | Max: 23m 08s
      🟩 20                 Pass: 100%/25  | Total:  2h 10m | Avg:  5m 13s | Max: 20m 42s | Hits:  89%/224   
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
pycuda
CCCL C Parallel Library

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
pycuda
CCCL C Parallel Library

🏃‍ Runner counts (total jobs: 54)

# Runner
43 linux-amd64-cpu16
5 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

@pciolkosz pciolkosz changed the title Add can_peer_access_to API to device_ref and check both ways access in get_peers [CUDAX] Add can_peer_access_to API to device_ref and check both ways access in get_peers Oct 29, 2024
//!
//! @param __other_dev Device to query the peer access
//! @return true if its possible for this device to access the specified device's memory
bool can_peer_access_to(device_ref __other_dev) const
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about can_access_peer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be fine with it too, I guess the issue is that in almost all cases the peer access is symmetrical, but technically there could be cases where it's not. I tried to capture the direction in the name dev1.can_peer_access_to(dev2), but I don't know if its a good idea.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dev1.has_peer_access_to(dev2)? or just dev1.is_peer_of(dev2)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What complicates it further is that this API queries only the possibility of peer access, while the actual enablement is a separate API (cudaDeviceEnablePeerAccess). But maybe because we don't expose the device level enablement of peer access and instead do it on memory pool / memory resource basis, then it's fine and these two won't be confused?
(So capability query is done on device level, enablement is on memory_pool / memory_resource and type separation handles the semantic separation).

I also very much like how is_peer_of reads like, the main problem is that technically it's possible to have GPU1 with peer access to GPU2, but not the other way. Being a peer sounds like a symmetric relation, while "peer access to" less so?
I think I would lean towards "peer accessible" meaning a GPU can access another GPU memory and "peers" being both ways peer accessible GPUs, like in get_peers. Since this is a one way query it would stay as peer_access_to and I think has might be better than can. I will update the name.

Another problem is I am not sure if the current CUDA Programming Guide considers peer devices to be just any two devices or peer access capable devices. But maybe it doesn't stop as from committing to the above definition of "peer" devices.

Copy link
Contributor

🟩 CI finished in 1h 33m: Pass: 100%/54 | Total: 4h 49m | Avg: 5m 21s | Max: 17m 39s | Hits: 55%/224
  • 🟩 cudax: Pass: 100%/54 | Total: 4h 49m | Avg: 5m 21s | Max: 17m 39s | Hits: 55%/224

    🟩 cpu
      🟩 amd64              Pass: 100%/50  | Total:  4h 34m | Avg:  5m 29s | Max: 17m 39s | Hits:  55%/224   
      🟩 arm64              Pass: 100%/4   | Total: 14m 58s | Avg:  3m 44s | Max:  3m 54s
    🟩 ctk
      🟩 12.0               Pass: 100%/19  | Total:  1h 42m | Avg:  5m 23s | Max: 17m 39s | Hits:  55%/112   
      🟩 12.5               Pass: 100%/2   | Total: 13m 02s | Avg:  6m 31s | Max:  7m 26s
      🟩 12.6               Pass: 100%/33  | Total:  2h 54m | Avg:  5m 16s | Max: 17m 37s | Hits:  55%/112   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/19  | Total:  1h 42m | Avg:  5m 23s | Max: 17m 39s | Hits:  55%/112   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 13m 02s | Avg:  6m 31s | Max:  7m 26s
      🟩 nvcc12.6           Pass: 100%/33  | Total:  2h 54m | Avg:  5m 16s | Max: 17m 37s | Hits:  55%/112   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/54  | Total:  4h 49m | Avg:  5m 21s | Max: 17m 39s | Hits:  55%/224   
    🟩 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  7m 59s | Avg:  3m 59s | Max:  4m 18s
      🟩 Clang10            Pass: 100%/2   | Total:  9m 31s | Avg:  4m 45s | Max:  4m 56s
      🟩 Clang11            Pass: 100%/4   | Total: 14m 53s | Avg:  3m 43s | Max:  3m 59s
      🟩 Clang12            Pass: 100%/4   | Total: 15m 03s | Avg:  3m 45s | Max:  4m 06s
      🟩 Clang13            Pass: 100%/4   | Total: 15m 01s | Avg:  3m 45s | Max:  3m 57s
      🟩 Clang14            Pass: 100%/4   | Total: 28m 58s | Avg:  7m 14s | Max: 17m 39s
      🟩 Clang15            Pass: 100%/2   | Total:  7m 32s | Avg:  3m 46s | Max:  3m 52s
      🟩 Clang16            Pass: 100%/4   | Total: 15m 46s | Avg:  3m 56s | Max:  4m 06s
      🟩 Clang17            Pass: 100%/2   | Total:  8m 18s | Avg:  4m 09s | Max:  4m 10s
      🟩 Clang18            Pass: 100%/2   | Total: 21m 18s | Avg: 10m 39s | Max: 17m 37s
      🟩 GCC9               Pass: 100%/2   | Total:  8m 02s | Avg:  4m 01s | Max:  4m 12s
      🟩 GCC10              Pass: 100%/4   | Total: 14m 43s | Avg:  3m 40s | Max:  3m 50s
      🟩 GCC11              Pass: 100%/4   | Total: 14m 59s | Avg:  3m 44s | Max:  4m 08s
      🟩 GCC12              Pass: 100%/7   | Total:  1h 06m | Avg:  9m 31s | Max: 17m 20s
      🟩 GCC13              Pass: 100%/3   | Total: 10m 51s | Avg:  3m 37s | Max:  3m 52s
      🟩 MSVC14.36          Pass: 100%/1   | Total:  9m 00s | Avg:  9m 00s | Max:  9m 00s | Hits:  55%/112   
      🟩 MSVC14.39          Pass: 100%/1   | Total:  8m 03s | Avg:  8m 03s | Max:  8m 03s | Hits:  55%/112   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 13m 02s | Avg:  6m 31s | Max:  7m 26s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/30  | Total:  2h 24m | Avg:  4m 48s | Max: 17m 39s
      🟩 GCC                Pass: 100%/20  | Total:  1h 55m | Avg:  5m 45s | Max: 17m 20s
      🟩 MSVC               Pass: 100%/2   | Total: 17m 03s | Avg:  8m 31s | Max:  9m 00s | Hits:  55%/224   
      🟩 NVHPC              Pass: 100%/2   | Total: 13m 02s | Avg:  6m 31s | Max:  7m 26s
    🟩 gpu
      🟩 v100               Pass: 100%/54  | Total:  4h 49m | Avg:  5m 21s | Max: 17m 39s | Hits:  55%/224   
    🟩 jobs
      🟩 Build              Pass: 100%/49  | Total:  3h 23m | Avg:  4m 08s | Max:  9m 00s | Hits:  55%/224   
      🟩 Test               Pass: 100%/5   | Total:  1h 26m | Avg: 17m 18s | Max: 17m 39s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  3m 16s | Avg:  3m 16s | Max:  3m 16s
      🟩 90a                Pass: 100%/1   | Total:  3m 39s | Avg:  3m 39s | Max:  3m 39s
    🟩 std
      🟩 17                 Pass: 100%/29  | Total:  2h 23m | Avg:  4m 56s | Max: 17m 20s
      🟩 20                 Pass: 100%/25  | Total:  2h 26m | Avg:  5m 51s | Max: 17m 39s | Hits:  55%/224   
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
pycuda
CCCL C Parallel Library

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
pycuda
CCCL C Parallel Library

🏃‍ Runner counts (total jobs: 54)

# Runner
43 linux-amd64-cpu16
5 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

@miscco miscco merged commit 7ff1d7b into NVIDIA:main Oct 30, 2024
69 checks passed
fbusato pushed a commit to fbusato/cccl that referenced this pull request Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants