Skip to content

Conversation

astroC86
Copy link
Contributor

@astroC86 astroC86 commented Aug 20, 2025

Motivation

Closes #98
Implements copy. I keep the get and put as wrappers to the copy function so that the tests pass

Technical Details

Test Plan

Test Result

(.iris_dev) root@2-6-0-gpu-mi300x1-192gb-devcloud-atl1:~/iris# mpirun -np 2 pytest ./tests/unittests/test_get.py 
============================================================================================== test session starts ==============================================================================================
platform linux -- Python 3.12.3, pytest-8.4.1, pluggy-1.6.0
============================================================================================== test session starts ==============================================================================================
platform linux -- Python 3.12.3, pytest-8.4.1, pluggy-1.6.0
rootdir: /root/iris
configfile: pyproject.toml
plugins: mpi-0.6
collecting ... rootdir: /root/iris
configfile: pyproject.toml
plugins: mpi-0.6
collected 16 items                                                                                                                                                                                              

collected 16 items                                                                                                                                                                                              

tests/unittests/test_get.py ................................                                                                                                                                                              [100%]                                                                                                                                                              [100%]



============================================================================================== 16 passed in 13.63s ==============================================================================================
============================================================================================== 16 passed in 13.63s ==============================================================================================
(.iris_dev) root@2-6-0-gpu-mi300x1-192gb-devcloud-atl1:~/iris# mpirun -np 2 pytest ./tests/unittests/test_put.py 
============================================================================================== test session starts ==============================================================================================
platform linux -- Python 3.12.3, pytest-8.4.1, pluggy-1.6.0
rootdir: /root/iris
configfile: pyproject.toml
plugins: mpi-0.6
collecting ... ============================================================================================== test session starts ==============================================================================================
platform linux -- Python 3.12.3, pytest-8.4.1, pluggy-1.6.0
rootdir: /root/iris
configfile: pyproject.toml
plugins: mpi-0.6
collected 16 items                                                                                                                                                                                              
collected 16 items                                                                                                                                                                                              

tests/unittests/test_put.py 
tests/unittests/test_put.py ................................                                                                                                                                                              [100%]                                                                                                                                                              [100%]

============================================================================================== 16 passed in 13.52s ==============================================================================================


============================================================================================== 16 passed in 13.51s ==============================================================================================

Submission Checklist

@astroC86 astroC86 force-pushed the astroC86/get-or-put-to-copy branch from 894410f to c2ca89c Compare August 20, 2025 20:38
@astroC86 astroC86 force-pushed the astroC86/get-or-put-to-copy branch from 940e3e9 to 3853f82 Compare August 20, 2025 20:39
@mawad-amd mawad-amd added core Core Iris library development iris Iris project issue labels Aug 21, 2025
@astroC86 astroC86 requested a review from mawad-amd August 22, 2025 12:31
@neoblizz
Copy link
Member

This PR will have to update examples that use put or get, there should be a few.

Copy link
Collaborator

@mawad-amd mawad-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there are a few other places where we use iris.put. But, I think there something about the semantics of the copy is not correct.

@@ -31,7 +31,7 @@ def get_kernel(
# Loop over all ranks, get the stored data.
# load to local register, accumulate.
for target_rank in range(num_ranks):
iris.get(data + offsets, results + offsets, cur_rank, target_rank, heap_bases, mask=mask)
iris.copy(data + offsets, results + offsets, cur_rank, target_rank, heap_bases, mask=mask)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be:

Suggested change
iris.copy(data + offsets, results + offsets, cur_rank, target_rank, heap_bases, mask=mask)
iris.copy(data + offsets, results + offsets, target_rank, cur_rank, heap_bases, mask=mask)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code I suggest fails the test btw but it shouldn't according to the docstring.

Copy link
Contributor Author

@astroC86 astroC86 Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume here the from_rank is always the current rank. I think if we want to allow for interchanging then perhaps this would more appropriate ? let me know what you think

@triton.jit
def copy(src_ptr, dst_ptr, from_rank, to_rank, cur_rank, heap_bases, mask=None):
    assert cur_rank == from_rank or cur_rank == to_rank, "Cannot copy between two arbitrary ranks"
    
    cur_base  = tl.load(heap_bases + cur_rank)

    from_base = tl.load(heap_bases + from_rank)
    to_base   = tl.load(heap_bases + to_rank)

    src_ptr_int = tl.cast(src_ptr, tl.uint64)
    src_offset = src_ptr_int - cur_base

    dst_ptr_int = tl.cast(dst_ptr, tl.uint64)
    dst_offset  = dst_ptr_int - cur_base

    from_base_byte = tl.cast(from_base, tl.pointer_type(tl.int8))
    to_base_byte   = tl.cast(to_base  , tl.pointer_type(tl.int8))

    translated_src = tl.cast(from_base_byte + src_offset, src_ptr.dtype)
    translated_dst = tl.cast(to_base_byte   + dst_offset, src_ptr.dtype)

    data = tl.load(translated_src, mask=mask)
    tl.store(translated_dst, data, mask=mask)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solution you proposed here is good but adds that additional overhead of the two translates. I have been thinking about this and I am not sure if there is away to resolve this cleanly.

I don’t really like the put/get names but maybe we will just stick to them for now. Let’s keep this PR open for now and we can come back to it later if we get better ideas. Thanks for your time looking into this and sorry this feature was not very well thought through.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi no worries at all! Thanks a lot for taking the time to review my solution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core Iris library development iris Iris project issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: Update put and get to copy
3 participants