How to change single copy VIA xpmem execution to the sender process

Hi Everyone,

@yosefe  @tvegas1 

I am currently examining the execution of MPI_Send (Blocking send) with UCX in an intra_node scenario. At present, the memory transfer (ucs_memcpy_relaxed()) is executed in the receiver process (rank or processor), as depicted below.

![reciver_process_ntbt](https://github.com/user-attachments/assets/6dc2d7cc-132f-43b1-b89a-35119e189603)

By executing the same in the sender process, as shown below, we could significantly reduce cache-to-cache data transfers and conserve memory bandwidth.

![sender_process_ntbt](https://github.com/user-attachments/assets/3ae36a79-7747-484a-971e-b2a1988ad449)

However, I am struggling to find a runtime configuration that would allow me to execute this transfer in the sender process with the hint UCS_ARCH_MEMCPY_NT_DEST and benchmark it. Could anyone provide some guidance or suggestions on this matter?

Thank you in advance for your assistance.

--Arun

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to change single copy VIA xpmem execution to the sender process #10019

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to change single copy VIA xpmem execution to the sender process #10019

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions