Skip to content

How to change single copy VIA xpmem execution to the sender process #10019

@arun-chandran-edarath

Description

@arun-chandran-edarath

Hi Everyone,

@yosefe @tvegas1

I am currently examining the execution of MPI_Send (Blocking send) with UCX in an intra_node scenario. At present, the memory transfer (ucs_memcpy_relaxed()) is executed in the receiver process (rank or processor), as depicted below.

reciver_process_ntbt

By executing the same in the sender process, as shown below, we could significantly reduce cache-to-cache data transfers and conserve memory bandwidth.

sender_process_ntbt

However, I am struggling to find a runtime configuration that would allow me to execute this transfer in the sender process with the hint UCS_ARCH_MEMCPY_NT_DEST and benchmark it. Could anyone provide some guidance or suggestions on this matter?

Thank you in advance for your assistance.

--Arun

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions