-
Notifications
You must be signed in to change notification settings - Fork 68
Description
Question
NVSHMEM 3.x | MoE training (DeepSeek DeepEP + custom modules in one process)
Background
We're building a training framework where multiple independent modules (e.g., DeepEP for MoE all-to-all, a custom overlap module) each call nvshmemx_init_attr(NVSHMEMX_INIT_WITH_UNIQUEID, ...) with potentially different rank/nranks. We ran into some issues and did a source audit — hoping to confirm our understanding and get advice.
Q1: Is it possible to change nranks via finalize + re-init?
From reading the source, it appears that nvshmem_finalize() resets the initialized flag but not the bootstrapped flag. So after finalize + re-init, the new rank/nranks/uid parameters are silently ignored — bootstrap is skipped entirely and the old boot_handle (pg_rank, pg_size, node topology, etc.) is reused.
Is this intended? Is there any supported way to fully reset bootstrap state within a process so that a subsequent init can join a different-sized communication world?
Q2: Multiple modules calling init with different nranks — is this safe?
For example:
- Module A inits with
nranks=8(intra-node only) - Module B inits with
nranks=32(all expert-parallel ranks)
Since NVSHMEM is a process-global singleton, the second init just bumps the refcount and its parameters are discarded. Module B ends up in Module A's 32-PE world without knowing it. This seems fundamentally unsafe — wrong PE numbering, nvshmem_malloc blocking on a global barrier that Module B's subset can't satisfy, etc.
What's the recommended practice here? Our best guess is:
- Coordinate a single init with the superset of all PEs
- Use
nvshmem_team_split_strided()for per-module sub-groups - Use team-based collectives +
nvshmem_team_translate_pe()for RMA
Is this the right approach? Any guidance on coordinating NVSHMEM across independently-developed modules would be very helpful.
Context
This came up while integrating DeepEP (which has its own init/finalize lifecycle) into PaddlePaddle alongside other NVSHMEM-based modules. Thanks in advance!