[CUDAX] Add can_peer_access_to API to device_ref and check both ways access in get_peers #2642

pciolkosz · 2024-10-28T23:57:29Z

While working on the P2P sample I realized in addition to get_peers() it can be useful to have a lower level directed p2p query. The name I picked, can_peer_access_to, might not be the best, but I think matches what the API does. This API does not query if memory if accessible, but if its possible to enable peers access and what-can-access-to-what order of devices I believe is the correct order here. It is also in line with the order on cudaDeviceCanAccessPeer.

I also changed get_peers() to check peer accessibility both ways just to be safe. The above API can be used in those weird non-symmetrical peer access cases instead.

github-actions · 2024-10-29T01:27:25Z

🟩 CI finished in 27m 39s: Pass: 100%/54 | Total: 4h 19m | Avg: 4m 48s | Max: 23m 08s | Hits: 89%/224

🟩 cudax: Pass: 100%/54 | Total: 4h 19m | Avg: 4m 48s | Max: 23m 08s | Hits: 89%/224

🟩 cpu
  🟩 amd64              Pass: 100%/50  | Total:  4h 09m | Avg:  4m 59s | Max: 23m 08s | Hits:  89%/224   
  🟩 arm64              Pass: 100%/4   | Total: 10m 12s | Avg:  2m 33s | Max:  2m 40s
🟩 ctk
  🟩 12.0               Pass: 100%/19  | Total:  1h 37m | Avg:  5m 06s | Max: 23m 08s | Hits:  89%/112   
  🟩 12.5               Pass: 100%/2   | Total:  9m 51s | Avg:  4m 55s | Max:  5m 03s
  🟩 12.6               Pass: 100%/33  | Total:  2h 32m | Avg:  4m 37s | Max: 22m 02s | Hits:  89%/112   
🟩 cudacxx
  🟩 nvcc12.0           Pass: 100%/19  | Total:  1h 37m | Avg:  5m 06s | Max: 23m 08s | Hits:  89%/112   
  🟩 nvcc12.5           Pass: 100%/2   | Total:  9m 51s | Avg:  4m 55s | Max:  5m 03s
  🟩 nvcc12.6           Pass: 100%/33  | Total:  2h 32m | Avg:  4m 37s | Max: 22m 02s | Hits:  89%/112   
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/54  | Total:  4h 19m | Avg:  4m 48s | Max: 23m 08s | Hits:  89%/224   
🟩 cxx
  🟩 Clang9             Pass: 100%/2   | Total:  6m 29s | Avg:  3m 14s | Max:  3m 20s
  🟩 Clang10            Pass: 100%/2   | Total:  6m 21s | Avg:  3m 10s | Max:  3m 25s
  🟩 Clang11            Pass: 100%/4   | Total: 12m 29s | Avg:  3m 07s | Max:  3m 21s
  🟩 Clang12            Pass: 100%/4   | Total: 12m 21s | Avg:  3m 05s | Max:  3m 20s
  🟩 Clang13            Pass: 100%/4   | Total: 12m 12s | Avg:  3m 03s | Max:  3m 11s
  🟩 Clang14            Pass: 100%/4   | Total: 30m 10s | Avg:  7m 32s | Max: 20m 42s
  🟩 Clang15            Pass: 100%/2   | Total:  6m 57s | Avg:  3m 28s | Max:  3m 46s
  🟩 Clang16            Pass: 100%/4   | Total: 11m 25s | Avg:  2m 51s | Max:  3m 17s
  🟩 Clang17            Pass: 100%/2   | Total:  7m 16s | Avg:  3m 38s | Max:  3m 39s
  🟩 Clang18            Pass: 100%/2   | Total: 19m 54s | Avg:  9m 57s | Max: 16m 54s
  🟩 GCC9               Pass: 100%/2   | Total:  5m 42s | Avg:  2m 51s | Max:  2m 55s
  🟩 GCC10              Pass: 100%/4   | Total: 11m 48s | Avg:  2m 57s | Max:  3m 06s
  🟩 GCC11              Pass: 100%/4   | Total: 11m 36s | Avg:  2m 54s | Max:  3m 06s
  🟩 GCC12              Pass: 100%/7   | Total:  1h 12m | Avg: 10m 25s | Max: 23m 08s
  🟩 GCC13              Pass: 100%/3   | Total:  8m 06s | Avg:  2m 42s | Max:  2m 51s
  🟩 MSVC14.36          Pass: 100%/1   | Total:  6m 55s | Avg:  6m 55s | Max:  6m 55s | Hits:  89%/112   
  🟩 MSVC14.39          Pass: 100%/1   | Total:  7m 02s | Avg:  7m 02s | Max:  7m 02s | Hits:  89%/112   
  🟩 NVHPC24.7          Pass: 100%/2   | Total:  9m 51s | Avg:  4m 55s | Max:  5m 03s
🟩 cxx_family
  🟩 Clang              Pass: 100%/30  | Total:  2h 05m | Avg:  4m 11s | Max: 20m 42s
  🟩 GCC                Pass: 100%/20  | Total:  1h 50m | Avg:  5m 30s | Max: 23m 08s
  🟩 MSVC               Pass: 100%/2   | Total: 13m 57s | Avg:  6m 58s | Max:  7m 02s | Hits:  89%/224   
  🟩 NVHPC              Pass: 100%/2   | Total:  9m 51s | Avg:  4m 55s | Max:  5m 03s
🟩 gpu
  🟩 v100               Pass: 100%/54  | Total:  4h 19m | Avg:  4m 48s | Max: 23m 08s | Hits:  89%/224   
🟩 jobs
  🟩 Build              Pass: 100%/49  | Total:  2h 41m | Avg:  3m 17s | Max:  7m 02s | Hits:  89%/224   
  🟩 Test               Pass: 100%/5   | Total:  1h 38m | Avg: 19m 39s | Max: 23m 08s
🟩 sm
  🟩 90                 Pass: 100%/1   | Total:  2m 30s | Avg:  2m 30s | Max:  2m 30s
  🟩 90a                Pass: 100%/1   | Total:  2m 51s | Avg:  2m 51s | Max:  2m 51s
🟩 std
  🟩 17                 Pass: 100%/29  | Total:  2h 09m | Avg:  4m 26s | Max: 23m 08s
  🟩 20                 Pass: 100%/25  | Total:  2h 10m | Avg:  5m 13s | Max: 20m 42s | Hits:  89%/224

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
	pycuda
	CCCL C Parallel Library

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
	pycuda
	CCCL C Parallel Library

🏃‍ Runner counts (total jobs: 54)

#	Runner
43	`linux-amd64-cpu16`
5	`linux-amd64-gpu-v100-latest-1`
4	`linux-arm64-cpu16`
2	`windows-amd64-cpu16`

miscco · 2024-10-29T06:38:34Z

cudax/include/cuda/experimental/__device/device_ref.cuh

+  //!
+  //! @param __other_dev Device to query the peer access
+  //! @return true if its possible for this device to access the specified device's memory
+  bool can_peer_access_to(device_ref __other_dev) const


How about can_access_peer

I would be fine with it too, I guess the issue is that in almost all cases the peer access is symmetrical, but technically there could be cases where it's not. I tried to capture the direction in the name dev1.can_peer_access_to(dev2), but I don't know if its a good idea.

dev1.has_peer_access_to(dev2)? or just dev1.is_peer_of(dev2)?

What complicates it further is that this API queries only the possibility of peer access, while the actual enablement is a separate API (cudaDeviceEnablePeerAccess). But maybe because we don't expose the device level enablement of peer access and instead do it on memory pool / memory resource basis, then it's fine and these two won't be confused?
(So capability query is done on device level, enablement is on memory_pool / memory_resource and type separation handles the semantic separation).

I also very much like how is_peer_of reads like, the main problem is that technically it's possible to have GPU1 with peer access to GPU2, but not the other way. Being a peer sounds like a symmetric relation, while "peer access to" less so?
I think I would lean towards "peer accessible" meaning a GPU can access another GPU memory and "peers" being both ways peer accessible GPUs, like in get_peers. Since this is a one way query it would stay as peer_access_to and I think has might be better than can. I will update the name.

Another problem is I am not sure if the current CUDA Programming Guide considers peer devices to be just any two devices or peer access capable devices. But maybe it doesn't stop as from committing to the above definition of "peer" devices.

github-actions · 2024-10-30T01:59:09Z

🟩 CI finished in 1h 33m: Pass: 100%/54 | Total: 4h 49m | Avg: 5m 21s | Max: 17m 39s | Hits: 55%/224

🟩 cudax: Pass: 100%/54 | Total: 4h 49m | Avg: 5m 21s | Max: 17m 39s | Hits: 55%/224

🟩 cpu
  🟩 amd64              Pass: 100%/50  | Total:  4h 34m | Avg:  5m 29s | Max: 17m 39s | Hits:  55%/224   
  🟩 arm64              Pass: 100%/4   | Total: 14m 58s | Avg:  3m 44s | Max:  3m 54s
🟩 ctk
  🟩 12.0               Pass: 100%/19  | Total:  1h 42m | Avg:  5m 23s | Max: 17m 39s | Hits:  55%/112   
  🟩 12.5               Pass: 100%/2   | Total: 13m 02s | Avg:  6m 31s | Max:  7m 26s
  🟩 12.6               Pass: 100%/33  | Total:  2h 54m | Avg:  5m 16s | Max: 17m 37s | Hits:  55%/112   
🟩 cudacxx
  🟩 nvcc12.0           Pass: 100%/19  | Total:  1h 42m | Avg:  5m 23s | Max: 17m 39s | Hits:  55%/112   
  🟩 nvcc12.5           Pass: 100%/2   | Total: 13m 02s | Avg:  6m 31s | Max:  7m 26s
  🟩 nvcc12.6           Pass: 100%/33  | Total:  2h 54m | Avg:  5m 16s | Max: 17m 37s | Hits:  55%/112   
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/54  | Total:  4h 49m | Avg:  5m 21s | Max: 17m 39s | Hits:  55%/224   
🟩 cxx
  🟩 Clang9             Pass: 100%/2   | Total:  7m 59s | Avg:  3m 59s | Max:  4m 18s
  🟩 Clang10            Pass: 100%/2   | Total:  9m 31s | Avg:  4m 45s | Max:  4m 56s
  🟩 Clang11            Pass: 100%/4   | Total: 14m 53s | Avg:  3m 43s | Max:  3m 59s
  🟩 Clang12            Pass: 100%/4   | Total: 15m 03s | Avg:  3m 45s | Max:  4m 06s
  🟩 Clang13            Pass: 100%/4   | Total: 15m 01s | Avg:  3m 45s | Max:  3m 57s
  🟩 Clang14            Pass: 100%/4   | Total: 28m 58s | Avg:  7m 14s | Max: 17m 39s
  🟩 Clang15            Pass: 100%/2   | Total:  7m 32s | Avg:  3m 46s | Max:  3m 52s
  🟩 Clang16            Pass: 100%/4   | Total: 15m 46s | Avg:  3m 56s | Max:  4m 06s
  🟩 Clang17            Pass: 100%/2   | Total:  8m 18s | Avg:  4m 09s | Max:  4m 10s
  🟩 Clang18            Pass: 100%/2   | Total: 21m 18s | Avg: 10m 39s | Max: 17m 37s
  🟩 GCC9               Pass: 100%/2   | Total:  8m 02s | Avg:  4m 01s | Max:  4m 12s
  🟩 GCC10              Pass: 100%/4   | Total: 14m 43s | Avg:  3m 40s | Max:  3m 50s
  🟩 GCC11              Pass: 100%/4   | Total: 14m 59s | Avg:  3m 44s | Max:  4m 08s
  🟩 GCC12              Pass: 100%/7   | Total:  1h 06m | Avg:  9m 31s | Max: 17m 20s
  🟩 GCC13              Pass: 100%/3   | Total: 10m 51s | Avg:  3m 37s | Max:  3m 52s
  🟩 MSVC14.36          Pass: 100%/1   | Total:  9m 00s | Avg:  9m 00s | Max:  9m 00s | Hits:  55%/112   
  🟩 MSVC14.39          Pass: 100%/1   | Total:  8m 03s | Avg:  8m 03s | Max:  8m 03s | Hits:  55%/112   
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 13m 02s | Avg:  6m 31s | Max:  7m 26s
🟩 cxx_family
  🟩 Clang              Pass: 100%/30  | Total:  2h 24m | Avg:  4m 48s | Max: 17m 39s
  🟩 GCC                Pass: 100%/20  | Total:  1h 55m | Avg:  5m 45s | Max: 17m 20s
  🟩 MSVC               Pass: 100%/2   | Total: 17m 03s | Avg:  8m 31s | Max:  9m 00s | Hits:  55%/224   
  🟩 NVHPC              Pass: 100%/2   | Total: 13m 02s | Avg:  6m 31s | Max:  7m 26s
🟩 gpu
  🟩 v100               Pass: 100%/54  | Total:  4h 49m | Avg:  5m 21s | Max: 17m 39s | Hits:  55%/224   
🟩 jobs
  🟩 Build              Pass: 100%/49  | Total:  3h 23m | Avg:  4m 08s | Max:  9m 00s | Hits:  55%/224   
  🟩 Test               Pass: 100%/5   | Total:  1h 26m | Avg: 17m 18s | Max: 17m 39s
🟩 sm
  🟩 90                 Pass: 100%/1   | Total:  3m 16s | Avg:  3m 16s | Max:  3m 16s
  🟩 90a                Pass: 100%/1   | Total:  3m 39s | Avg:  3m 39s | Max:  3m 39s
🟩 std
  🟩 17                 Pass: 100%/29  | Total:  2h 23m | Avg:  4m 56s | Max: 17m 20s
  🟩 20                 Pass: 100%/25  | Total:  2h 26m | Avg:  5m 51s | Max: 17m 39s | Hits:  55%/224

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
	pycuda
	CCCL C Parallel Library

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
	pycuda
	CCCL C Parallel Library

🏃‍ Runner counts (total jobs: 54)

#	Runner
43	`linux-amd64-cpu16`
5	`linux-amd64-gpu-v100-latest-1`
4	`linux-arm64-cpu16`
2	`windows-amd64-cpu16`

…access in get_peers (NVIDIA#2642)

Some follow-up work on p2p

a4be366

pciolkosz requested a review from a team as a code owner October 28, 2024 23:57

Merge branch 'main' into can_peer_acccess_to

932adeb

pciolkosz changed the title ~~Add can_peer_access_to API to device_ref and check both ways access in get_peers~~ [CUDAX] Add can_peer_access_to API to device_ref and check both ways access in get_peers Oct 29, 2024

miscco reviewed Oct 29, 2024

View reviewed changes

miscco approved these changes Oct 29, 2024

View reviewed changes

ericniebler approved these changes Oct 29, 2024

View reviewed changes

Change can_peer_access_to to has_peer_access_to

1f48c68

miscco approved these changes Oct 30, 2024

View reviewed changes

miscco merged commit 7ff1d7b into NVIDIA:main Oct 30, 2024
69 checks passed

fbusato pushed a commit to fbusato/cccl that referenced this pull request Nov 5, 2024

[CUDAX] Add has_peer_access_to API to device_ref and check both ways …

d0fae32

…access in get_peers (NVIDIA#2642)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDAX] Add can_peer_access_to API to device_ref and check both ways access in get_peers #2642

[CUDAX] Add can_peer_access_to API to device_ref and check both ways access in get_peers #2642

pciolkosz commented Oct 28, 2024

github-actions bot commented Oct 29, 2024

🟩 cudax: Pass: 100%/54 | Total: 4h 19m | Avg: 4m 48s | Max: 23m 08s | Hits: 89%/224

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 54)

miscco Oct 29, 2024

pciolkosz Oct 29, 2024

ericniebler Oct 29, 2024

pciolkosz Oct 29, 2024

github-actions bot commented Oct 30, 2024

🟩 cudax: Pass: 100%/54 | Total: 4h 49m | Avg: 5m 21s | Max: 17m 39s | Hits: 55%/224

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 54)

[CUDAX] Add can_peer_access_to API to device_ref and check both ways access in get_peers #2642

[CUDAX] Add can_peer_access_to API to device_ref and check both ways access in get_peers #2642

Conversation

pciolkosz commented Oct 28, 2024

github-actions bot commented Oct 29, 2024

🟩 cudax: Pass: 100%/54 | Total: 4h 19m | Avg: 4m 48s | Max: 23m 08s | Hits: 89%/224

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 54)

miscco Oct 29, 2024

Choose a reason for hiding this comment

pciolkosz Oct 29, 2024

Choose a reason for hiding this comment

ericniebler Oct 29, 2024

Choose a reason for hiding this comment

pciolkosz Oct 29, 2024

Choose a reason for hiding this comment

github-actions bot commented Oct 30, 2024

🟩 cudax: Pass: 100%/54 | Total: 4h 49m | Avg: 5m 21s | Max: 17m 39s | Hits: 55%/224

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 54)