Implements copy function #113

astroC86 · 2025-08-20T20:37:04Z

Motivation

Closes #98
Implements copy. I keep the get and put as wrappers to the copy function so that the tests pass

Technical Details

Test Plan

Test Result

(.iris_dev) root@2-6-0-gpu-mi300x1-192gb-devcloud-atl1:~/iris# mpirun -np 2 pytest ./tests/unittests/test_get.py 
============================================================================================== test session starts ==============================================================================================
platform linux -- Python 3.12.3, pytest-8.4.1, pluggy-1.6.0
============================================================================================== test session starts ==============================================================================================
platform linux -- Python 3.12.3, pytest-8.4.1, pluggy-1.6.0
rootdir: /root/iris
configfile: pyproject.toml
plugins: mpi-0.6
collecting ... rootdir: /root/iris
configfile: pyproject.toml
plugins: mpi-0.6
collected 16 items                                                                                                                                                                                              

collected 16 items                                                                                                                                                                                              

tests/unittests/test_get.py ................................                                                                                                                                                              [100%]                                                                                                                                                              [100%]



============================================================================================== 16 passed in 13.63s ==============================================================================================
============================================================================================== 16 passed in 13.63s ==============================================================================================
(.iris_dev) root@2-6-0-gpu-mi300x1-192gb-devcloud-atl1:~/iris# mpirun -np 2 pytest ./tests/unittests/test_put.py 
============================================================================================== test session starts ==============================================================================================
platform linux -- Python 3.12.3, pytest-8.4.1, pluggy-1.6.0
rootdir: /root/iris
configfile: pyproject.toml
plugins: mpi-0.6
collecting ... ============================================================================================== test session starts ==============================================================================================
platform linux -- Python 3.12.3, pytest-8.4.1, pluggy-1.6.0
rootdir: /root/iris
configfile: pyproject.toml
plugins: mpi-0.6
collected 16 items                                                                                                                                                                                              
collected 16 items                                                                                                                                                                                              

tests/unittests/test_put.py 
tests/unittests/test_put.py ................................                                                                                                                                                              [100%]                                                                                                                                                              [100%]

============================================================================================== 16 passed in 13.52s ==============================================================================================


============================================================================================== 16 passed in 13.51s ==============================================================================================

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

…o-copy

iris/iris.py

neoblizz · 2025-08-22T16:15:24Z

This PR will have to update examples that use put or get, there should be a few.

mawad-amd

Yeah, there are a few other places where we use iris.put. But, I think there something about the semantics of the copy is not correct.

mawad-amd · 2025-08-22T15:29:44Z

tests/unittests/test_get.py

@@ -31,7 +31,7 @@ def get_kernel(
    # Loop over all ranks, get the stored data.
    # load to local register, accumulate.
    for target_rank in range(num_ranks):
-        iris.get(data + offsets, results + offsets, cur_rank, target_rank, heap_bases, mask=mask)
+        iris.copy(data + offsets, results + offsets, cur_rank, target_rank, heap_bases, mask=mask)


Shouldn't this be:

Suggested change

iris.copy(data + offsets, results + offsets, cur_rank, target_rank, heap_bases, mask=mask)

iris.copy(data + offsets, results + offsets, target_rank, cur_rank, heap_bases, mask=mask)

The code I suggest fails the test btw but it shouldn't according to the docstring.

I assume here the from_rank is always the current rank. I think if we want to allow for interchanging then perhaps this would more appropriate ? let me know what you think

@triton.jit def copy(src_ptr, dst_ptr, from_rank, to_rank, cur_rank, heap_bases, mask=None): assert cur_rank == from_rank or cur_rank == to_rank, "Cannot copy between two arbitrary ranks" cur_base = tl.load(heap_bases + cur_rank) from_base = tl.load(heap_bases + from_rank) to_base = tl.load(heap_bases + to_rank) src_ptr_int = tl.cast(src_ptr, tl.uint64) src_offset = src_ptr_int - cur_base dst_ptr_int = tl.cast(dst_ptr, tl.uint64) dst_offset = dst_ptr_int - cur_base from_base_byte = tl.cast(from_base, tl.pointer_type(tl.int8)) to_base_byte = tl.cast(to_base , tl.pointer_type(tl.int8)) translated_src = tl.cast(from_base_byte + src_offset, src_ptr.dtype) translated_dst = tl.cast(to_base_byte + dst_offset, src_ptr.dtype) data = tl.load(translated_src, mask=mask) tl.store(translated_dst, data, mask=mask)

The solution you proposed here is good but adds that additional overhead of the two translates. I have been thinking about this and I am not sure if there is away to resolve this cleanly.

I don’t really like the put/get names but maybe we will just stick to them for now. Let’s keep this PR open for now and we can come back to it later if we get better ideas. Thanks for your time looking into this and sorry this feature was not very well thought through.

Hi no worries at all! Thanks a lot for taking the time to review my solution!

astroC86 added 3 commits August 19, 2025 21:10

initial copy impl

1031a5d

Merge remote-tracking branch 'origin/main' into astroC86/get-or-put-t…

504b3f4

…o-copy

Merge branch 'main' into astroC86/get-or-put-to-copy

d1cc73f

astroC86 requested review from mawad-amd, neoblizz and BKP as code owners August 20, 2025 20:37

astroC86 force-pushed the astroC86/get-or-put-to-copy branch from 894410f to c2ca89c Compare August 20, 2025 20:38

Intial impl of copy

3853f82

astroC86 force-pushed the astroC86/get-or-put-to-copy branch from 940e3e9 to 3853f82 Compare August 20, 2025 20:39

github-actions bot and others added 2 commits August 20, 2025 20:39

Apply Ruff auto-fixes

ea6f2da

Merge branch 'main' into astroC86/get-or-put-to-copy

9975944

mawad-amd reviewed Aug 21, 2025

View reviewed changes

iris/iris.py Outdated Show resolved Hide resolved

mawad-amd added core Core Iris library development iris Iris project issue labels Aug 21, 2025

astroC86 and others added 3 commits August 22, 2025 11:40

replaced get and put with copy

e85c1bb

Apply Ruff auto-fixes

f7e02bc

Merge branch 'main' into astroC86/get-or-put-to-copy

36b563f

astroC86 requested a review from mawad-amd August 22, 2025 12:31

mawad-amd reviewed Aug 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implements copy function #113

Implements copy function #113

Uh oh!

astroC86 commented Aug 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

neoblizz commented Aug 22, 2025

Uh oh!

mawad-amd left a comment

Uh oh!

mawad-amd Aug 22, 2025

Uh oh!

mawad-amd Aug 22, 2025

Uh oh!

astroC86 Aug 22, 2025 •

edited

Loading

Uh oh!

mawad-amd Aug 25, 2025

Uh oh!

astroC86 Aug 26, 2025

Uh oh!

Uh oh!

	iris.copy(data + offsets, results + offsets, cur_rank, target_rank, heap_bases, mask=mask)
	iris.copy(data + offsets, results + offsets, target_rank, cur_rank, heap_bases, mask=mask)

Implements copy function #113

Are you sure you want to change the base?

Implements copy function #113

Uh oh!

Conversation

astroC86 commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Uh oh!

neoblizz commented Aug 22, 2025

Uh oh!

mawad-amd left a comment

Choose a reason for hiding this comment

Uh oh!

mawad-amd Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

mawad-amd Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

astroC86 Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mawad-amd Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

astroC86 Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

astroC86 commented Aug 20, 2025 •

edited

Loading

astroC86 Aug 22, 2025 •

edited

Loading