Skip to content

[P2P] Unify intra-node and inter-node API.#716

Open
zhongjiechen wants to merge 2 commits intomainfrom
p2p-unify
Open

[P2P] Unify intra-node and inter-node API.#716
zhongjiechen wants to merge 2 commits intomainfrom
p2p-unify

Conversation

@zhongjiechen
Copy link
Copy Markdown
Member

Description

Fixes #715

Type of Change

  • Bug fix
  • New feature
  • Documentation update

How Has This Been Tested?

Include any tests here.

  • Unit tests
  • Integration tests
  • Manual testing

Checklist

  • My code follows the style guidelines, e.g. format.sh.
  • I have run build_and_install.sh to verify compilation.
  • I have removed redundant variables and comments.
  • I have updated the documentation.
  • I have added tests.

@zhongjiechen
Copy link
Copy Markdown
Member Author

zhongjiechen commented Feb 7, 2026

It passed the following intra-node tests and all inter-node tests:

torchrun --nnodes=1 --nproc_per_node=2 benchmarks/benchmark_uccl.py --ipc
torchrun --nproc_per_node=8 benchmarks/benchmark_uccl_collective.py --ring
torchrun --nproc_per_node=2 tests/test_engine_nvlink.py

@zhongjiechen
Copy link
Copy Markdown
Member Author

zhongjiechen commented Feb 7, 2026

The basic idea is to use conn->is_local() to check uds_sockfd_, which is only used by intra-node communication. For inter-node communication, it is set to -1.

The one-sided operations (write/read) require different metadata types depending on the communication: FifoItem for RDMA and IpcTransferInfo for IPC. So, use opaque-metadata overloads (void const* meta, size_t meta_len) to let pybind pass raw bytes from advertise() without knowing the underlying type. The C++ side checks conn->is_local() to deserialize as IpcTransferInfo or FifoItem accordingly.

@praveingk
Copy link
Copy Markdown
Collaborator

Thanks @zhongjiechen, this is great. I see the changes don't apply for readv/writev since there is no readv_ipc/writev_ipc. Would you be adding that?

@zhongjiechen
Copy link
Copy Markdown
Member Author

Thanks @zhongjiechen, this is great. I see the changes don't apply for readv/writev since there is no readv_ipc/writev_ipc. Would you be adding that?

I think we can support that in another PR.

@YangZhou1997
Copy link
Copy Markdown
Member

Hi @zhongjiechen , we recently upgrade P2P from UDS to shm-backed jring in #718. So you might need to resolve some conflict

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[P2P] Unify intra-node and inter-node API

3 participants