Skip to content

feat: add configurable recv buffer sizing for P2PAllToAll#10

Open
crgg1433 wants to merge 1 commit intoperplexityai:mainfrom
crgg1433:optimize-recv-buffer
Open

feat: add configurable recv buffer sizing for P2PAllToAll#10
crgg1433 wants to merge 1 commit intoperplexityai:mainfrom
crgg1433:optimize-recv-buffer

Conversation

@crgg1433
Copy link
Copy Markdown

@crgg1433 crgg1433 commented Mar 19, 2026

Summary

This PR adds configurable recv buffer sizing for P2PAllToAll: max_recv_tokens recv_buffer_factor

Three sizing strategies are supported (in order of priority):

  1. max_recv_tokens_override: explicit override for the recv buffer token capacity. Recommended when the routing behavior is well understood, for best sizing and memory efficiency.
  2. recv_buffer_factor: multiplier on the balanced estimate.
  3. Default: worst-case upper bound. i.e., extreme imbalanced routing, all token's top k experts are on one rank, or every token is routed to every local expert on one rank

Usage

# Before: 
all_to_all = P2PAllToAll(max_num_tokens=128, num_experts=128, ...)

# After: 
all_to_all = P2PAllToAll(max_num_tokens=128, num_experts=128, ...,
                         recv_buffer_factor=2.0)

An optional runtime buffer utilization check is available via PPLX_CHECK_RECV_BUF_USAGE=1 to help tune buffer sizing (requires a d2h sync, so it is off by default). It logs the percentage of buffer utilization per rank, helping identify how much the buffer can be further shrunk.

The buffer sizing logic is extracted into a standalone compute_max_recv_tokens() function for testability.

Purpose

The original implementation computes recv buffer size from a worst-case bound that assumes maximally imbalanced routing. This can allocate much more GPU memory (~4x in practice) than actually requires, wasting memory or even causing CUDA OOM at init time on memory-constrained setups.

Default behavior is unchanged — when neither parameter is specified, the original worst-case allocation is used.

Important caveat:

The current one-sided RDMA write path has no graceful overflow handling. If buffer overflows, the Rust RDMA handler will panic code. Token dropping would require changes to the CUDA dispatch kernel and the Rust RDMA layer, which is out of scope here. Users should verify their buffer is sufficient using the overflow check before running production workloads.

Test

Added unit test to cover the buffer sizing logic

tests/p2p_all_to_all/test_p2p_all_to_all.py::TestComputeMaxRecvTokens::test_default_uses_worst_case PASSED
tests/p2p_all_to_all/test_p2p_all_to_all.py::TestComputeMaxRecvTokens::test_balanced_less_than_worst_case PASSED
tests/p2p_all_to_all/test_p2p_all_to_all.py::TestComputeMaxRecvTokens::test_factor_scales_balanced_estimate PASSED
tests/p2p_all_to_all/test_p2p_all_to_all.py::TestComputeMaxRecvTokens::test_factor_one_equals_balanced PASSED
tests/p2p_all_to_all/test_p2p_all_to_all.py::TestComputeMaxRecvTokens::test_factor_clamped_to_worst_case PASSED
tests/p2p_all_to_all/test_p2p_all_to_all.py::TestComputeMaxRecvTokens::test_factor_respects_expert_padding PASSED
tests/p2p_all_to_all/test_p2p_all_to_all.py::TestComputeMaxRecvTokens::test_override_used_directly PASSED
tests/p2p_all_to_all/test_p2p_all_to_all.py::TestComputeMaxRecvTokens::test_override_clamped_to_worst_case
---------------------------------------------------------------------------------- live log call -----------------------------------------------------------------------------------
[2026-03-19 00:45:33.726] 55006 WARNING  pplx_garden.kernels.p2p_all_to_all max_recv_tokens (999999999) exceeds worst-case bound (1048); clamping to worst-case
PASSED
tests/p2p_all_to_all/test_p2p_all_to_all.py::TestComputeMaxRecvTokens::test_override_takes_priority_over_factor PASSED
tests/p2p_all_to_all/test_p2p_all_to_all.py::TestComputeMaxRecvTokens::test_multi_dp_group_increases_buffer PASSED
tests/p2p_all_to_all/test_p2p_all_to_all.py::test_p2p_all_to_all[TP2-NIC1-FP32] [2026-03-19 00:45:35.630] 55072 INFO     pplx_garden.distributed.process_group [rank=0] Initializing global process group. device=cuda:0, init_method=file:///tmp/tmp7s30z9k6/pplx_garden_parallel_init, world_size=2
[2026-03-19 00:45:35.650] 55073 INFO     pplx_garden.distributed.process_group [rank=1] Initializing global process group. device=cuda:1, init_method=file:///tmp/tmp7s30z9k6/pplx_garden_parallel_init, world_size=2
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[2026-03-19 00:45:37.226] 55073 INFO     pplx_garden.distributed.process_group [rank=1] Initialized global process group.
[2026-03-19 00:45:37.226] 55072 INFO     pplx_garden.distributed.process_group [rank=0] Initialized global process group.
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[2026-03-19 00:45:37.461] 55072 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (2) + NVLink (1)
[2026-03-19 00:45:37.484] 55073 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (2) + NVLink (1)
[2026-03-19 00:45:37.713] 55073 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:45:37.713] 55072 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:45:38.250] 55073 INFO     pplx_garden.distributed.process_group [rank=1] Destroyed global process group.
[2026-03-19 00:45:38.250] 55072 INFO     pplx_garden.distributed.process_group [rank=0] Destroyed global process group.
PASSED
tests/p2p_all_to_all/test_p2p_all_to_all.py::test_p2p_all_to_all[TP2-NIC1-FP32-MIXED] [2026-03-19 00:45:40.700] 55278 INFO     pplx_garden.distributed.process_group [rank=1] Initializing global process group. device=cuda:1, init_method=file:///tmp/tmpf8hd4sfh/pplx_garden_parallel_init, world_size=2
[2026-03-19 00:45:40.803] 55277 INFO     pplx_garden.distributed.process_group [rank=0] Initializing global process group. device=cuda:0, init_method=file:///tmp/tmpf8hd4sfh/pplx_garden_parallel_init, world_size=2
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[2026-03-19 00:45:42.337] 55278 INFO     pplx_garden.distributed.process_group [rank=1] Initialized global process group.
[2026-03-19 00:45:42.337] 55277 INFO     pplx_garden.distributed.process_group [rank=0] Initialized global process group.
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[2026-03-19 00:45:42.571] 55277 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (2) + NVLink (1)
[2026-03-19 00:45:42.593] 55278 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (2) + NVLink (1)
[2026-03-19 00:45:42.833] 55277 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:45:42.835] 55278 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:45:43.359] 55278 INFO     pplx_garden.distributed.process_group [rank=1] Destroyed global process group.
[2026-03-19 00:45:43.359] 55277 INFO     pplx_garden.distributed.process_group [rank=0] Destroyed global process group.
PASSED
tests/p2p_all_to_all/test_p2p_all_to_all.py::test_p2p_all_to_all[TP2-NIC1-FP32-NVL] [2026-03-19 00:45:45.840] 55482 INFO     pplx_garden.distributed.process_group [rank=0] Initializing global process group. device=cuda:0, init_method=file:///tmp/tmpbpljfyv3/pplx_garden_parallel_init, world_size=2
[2026-03-19 00:45:45.860] 55483 INFO     pplx_garden.distributed.process_group [rank=1] Initializing global process group. device=cuda:1, init_method=file:///tmp/tmpbpljfyv3/pplx_garden_parallel_init, world_size=2
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[2026-03-19 00:45:47.465] 55483 INFO     pplx_garden.distributed.process_group [rank=1] Initialized global process group.
[2026-03-19 00:45:47.466] 55482 INFO     pplx_garden.distributed.process_group [rank=0] Initialized global process group.
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[2026-03-19 00:45:47.953] 55483 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (2) + NVLink (2)
[2026-03-19 00:45:47.954] 55482 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (2) + NVLink (2)
[2026-03-19 00:45:48.194] 55482 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:45:48.196] 55483 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:45:49.015] 55482 INFO     pplx_garden.distributed.process_group [rank=0] Destroyed global process group.
[2026-03-19 00:45:49.015] 55483 INFO     pplx_garden.distributed.process_group [rank=1] Destroyed global process group.
PASSED
tests/p2p_all_to_all/test_p2p_all_to_all.py::test_p2p_all_to_all[TP2-NIC1-BF16] [2026-03-19 00:45:51.518] 55707 INFO     pplx_garden.distributed.process_group [rank=0] Initializing global process group. device=cuda:0, init_method=file:///tmp/tmpcysgnayb/pplx_garden_parallel_init, world_size=2
[2026-03-19 00:45:51.538] 55708 INFO     pplx_garden.distributed.process_group [rank=1] Initializing global process group. device=cuda:1, init_method=file:///tmp/tmpcysgnayb/pplx_garden_parallel_init, world_size=2
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[2026-03-19 00:45:53.087] 55708 INFO     pplx_garden.distributed.process_group [rank=1] Initialized global process group.
[2026-03-19 00:45:53.087] 55707 INFO     pplx_garden.distributed.process_group [rank=0] Initialized global process group.
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[2026-03-19 00:45:53.322] 55707 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (2) + NVLink (1)
[2026-03-19 00:45:53.344] 55708 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (2) + NVLink (1)
[2026-03-19 00:45:53.580] 55708 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:45:53.581] 55707 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:45:54.115] 55708 INFO     pplx_garden.distributed.process_group [rank=1] Destroyed global process group.
[2026-03-19 00:45:54.115] 55707 INFO     pplx_garden.distributed.process_group [rank=0] Destroyed global process group.
PASSED
tests/p2p_all_to_all/test_p2p_all_to_all.py::test_p2p_all_to_all[TP2-NIC1-BF16-PADDED] [2026-03-19 00:45:56.617] 55913 INFO     pplx_garden.distributed.process_group [rank=1] Initializing global process group. device=cuda:1, init_method=file:///tmp/tmpkg9e7tr6/pplx_garden_parallel_init, world_size=2
[2026-03-19 00:45:56.637] 55912 INFO     pplx_garden.distributed.process_group [rank=0] Initializing global process group. device=cuda:0, init_method=file:///tmp/tmpkg9e7tr6/pplx_garden_parallel_init, world_size=2
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[2026-03-19 00:45:58.247] 55913 INFO     pplx_garden.distributed.process_group [rank=1] Initialized global process group.
[2026-03-19 00:45:58.247] 55912 INFO     pplx_garden.distributed.process_group [rank=0] Initialized global process group.
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[2026-03-19 00:45:58.484] 55912 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (2) + NVLink (1)
[2026-03-19 00:45:58.506] 55913 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (2) + NVLink (1)
[2026-03-19 00:45:58.740] 55912 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:45:58.744] 55913 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:45:59.275] 55913 INFO     pplx_garden.distributed.process_group [rank=1] Destroyed global process group.
[2026-03-19 00:45:59.275] 55912 INFO     pplx_garden.distributed.process_group [rank=0] Destroyed global process group.
PASSED
tests/p2p_all_to_all/test_p2p_all_to_all.py::test_p2p_all_to_all[TP2-NIC1-FP8] [2026-03-19 00:46:01.739] 56118 INFO     pplx_garden.distributed.process_group [rank=1] Initializing global process group. device=cuda:1, init_method=file:///tmp/tmpns2tjtkp/pplx_garden_parallel_init, world_size=2
[2026-03-19 00:46:01.741] 56117 INFO     pplx_garden.distributed.process_group [rank=0] Initializing global process group. device=cuda:0, init_method=file:///tmp/tmpns2tjtkp/pplx_garden_parallel_init, world_size=2
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[2026-03-19 00:46:03.331] 56118 INFO     pplx_garden.distributed.process_group [rank=1] Initialized global process group.
[2026-03-19 00:46:03.331] 56117 INFO     pplx_garden.distributed.process_group [rank=0] Initialized global process group.
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[2026-03-19 00:46:03.568] 56117 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (2) + NVLink (1)
[2026-03-19 00:46:03.590] 56118 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (2) + NVLink (1)
[2026-03-19 00:46:03.814] 56117 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:46:03.817] 56118 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:46:04.357] 56117 INFO     pplx_garden.distributed.process_group [rank=0] Destroyed global process group.
[2026-03-19 00:46:04.357] 56118 INFO     pplx_garden.distributed.process_group [rank=1] Destroyed global process group.
PASSED
tests/p2p_all_to_all/test_p2p_all_to_all.py::test_p2p_all_to_all[TP4-NIC1-FP32] [2026-03-19 00:46:07.439] 56322 INFO     pplx_garden.distributed.process_group [rank=0] Initializing global process group. device=cuda:0, init_method=file:///tmp/tmp1zbvlnzm/pplx_garden_parallel_init, world_size=4
[2026-03-19 00:46:07.579] 56324 INFO     pplx_garden.distributed.process_group [rank=2] Initializing global process group. device=cuda:2, init_method=file:///tmp/tmp1zbvlnzm/pplx_garden_parallel_init, world_size=4
[2026-03-19 00:46:07.620] 56323 INFO     pplx_garden.distributed.process_group [rank=1] Initializing global process group. device=cuda:1, init_method=file:///tmp/tmp1zbvlnzm/pplx_garden_parallel_init, world_size=4
[2026-03-19 00:46:07.743] 56325 INFO     pplx_garden.distributed.process_group [rank=3] Initializing global process group. device=cuda:3, init_method=file:///tmp/tmp1zbvlnzm/pplx_garden_parallel_init, world_size=4
[Gloo] Rank 0 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 1 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 2 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 3 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[2026-03-19 00:46:10.247] 56322 INFO     pplx_garden.distributed.process_group [rank=0] Initialized global process group.
[2026-03-19 00:46:10.247] 56323 INFO     pplx_garden.distributed.process_group [rank=1] Initialized global process group.
[2026-03-19 00:46:10.247] 56325 INFO     pplx_garden.distributed.process_group [rank=3] Initialized global process group.
[2026-03-19 00:46:10.247] 56324 INFO     pplx_garden.distributed.process_group [rank=2] Initialized global process group.
[Gloo] Rank 1 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 2 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 0 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 3 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[2026-03-19 00:46:10.660] 56324 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (4) + NVLink (1)
[2026-03-19 00:46:10.665] 56322 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (4) + NVLink (1)
[2026-03-19 00:46:10.668] 56323 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (4) + NVLink (1)
[2026-03-19 00:46:10.681] 56325 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (4) + NVLink (1)
[2026-03-19 00:46:11.121] 56324 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:46:11.123] 56322 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:46:11.123] 56323 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:46:11.124] 56325 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:46:11.815] 56322 INFO     pplx_garden.distributed.process_group [rank=0] Destroyed global process group.
[2026-03-19 00:46:11.835] 56324 INFO     pplx_garden.distributed.process_group [rank=2] Destroyed global process group.
[2026-03-19 00:46:11.835] 56323 INFO     pplx_garden.distributed.process_group [rank=1] Destroyed global process group.
[2026-03-19 00:46:11.835] 56325 INFO     pplx_garden.distributed.process_group [rank=3] Destroyed global process group.
PASSED
tests/p2p_all_to_all/test_p2p_all_to_all.py::test_p2p_all_to_all[TP4-NIC2-BF16] [2026-03-19 00:46:15.229] 56748 INFO     pplx_garden.distributed.process_group [rank=1] Initializing global process group. device=cuda:1, init_method=file:///tmp/tmply2ugppp/pplx_garden_parallel_init, world_size=4
[2026-03-19 00:46:15.336] 56749 INFO     pplx_garden.distributed.process_group [rank=2] Initializing global process group. device=cuda:2, init_method=file:///tmp/tmply2ugppp/pplx_garden_parallel_init, world_size=4
[2026-03-19 00:46:15.349] 56750 INFO     pplx_garden.distributed.process_group [rank=3] Initializing global process group. device=cuda:3, init_method=file:///tmp/tmply2ugppp/pplx_garden_parallel_init, world_size=4
[2026-03-19 00:46:15.461] 56747 INFO     pplx_garden.distributed.process_group [rank=0] Initializing global process group. device=cuda:0, init_method=file:///tmp/tmply2ugppp/pplx_garden_parallel_init, world_size=4
[Gloo] Rank 2 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 0 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 1 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 3 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[2026-03-19 00:46:17.939] 56747 INFO     pplx_garden.distributed.process_group [rank=0] Initialized global process group.
[2026-03-19 00:46:17.939] 56748 INFO     pplx_garden.distributed.process_group [rank=1] Initialized global process group.
[2026-03-19 00:46:17.940] 56750 INFO     pplx_garden.distributed.process_group [rank=3] Initialized global process group.
[2026-03-19 00:46:17.940] 56749 INFO     pplx_garden.distributed.process_group [rank=2] Initialized global process group.
[Gloo] Rank 0 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 2 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 1 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 3 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[2026-03-19 00:46:18.383] 56747 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (4) + NVLink (1)
[2026-03-19 00:46:18.396] 56748 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (4) + NVLink (1)
[2026-03-19 00:46:18.398] 56749 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (4) + NVLink (1)
[2026-03-19 00:46:18.407] 56750 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (4) + NVLink (1)
[2026-03-19 00:46:20.334] 56750 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:46:20.335] 56747 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:46:20.340] 56749 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:46:20.390] 56748 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:46:20.983] 56749 INFO     pplx_garden.distributed.process_group [rank=2] Destroyed global process group.
[2026-03-19 00:46:20.984] 56747 INFO     pplx_garden.distributed.process_group [rank=0] Destroyed global process group.
[2026-03-19 00:46:21.004] 56750 INFO     pplx_garden.distributed.process_group [rank=3] Destroyed global process group.
[2026-03-19 00:46:21.004] 56748 INFO     pplx_garden.distributed.process_group [rank=1] Destroyed global process group.
PASSED
tests/p2p_all_to_all/test_p2p_all_to_all.py::test_p2p_all_to_all[TP4-DP2-NIC1-BF16] [2026-03-19 00:46:24.356] 57173 INFO     pplx_garden.distributed.process_group [rank=1] Initializing global process group. device=cuda:1, init_method=file:///tmp/tmp1usg559d/pplx_garden_parallel_init, world_size=4
[2026-03-19 00:46:24.363] 57175 INFO     pplx_garden.distributed.process_group [rank=3] Initializing global process group. device=cuda:3, init_method=file:///tmp/tmp1usg559d/pplx_garden_parallel_init, world_size=4
[2026-03-19 00:46:24.568] 57172 INFO     pplx_garden.distributed.process_group [rank=0] Initializing global process group. device=cuda:0, init_method=file:///tmp/tmp1usg559d/pplx_garden_parallel_init, world_size=4
[2026-03-19 00:46:24.601] 57174 INFO     pplx_garden.distributed.process_group [rank=2] Initializing global process group. device=cuda:2, init_method=file:///tmp/tmp1usg559d/pplx_garden_parallel_init, world_size=4
[Gloo] Rank 2 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 1 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 3 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 0 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[2026-03-19 00:46:27.038] 57172 INFO     pplx_garden.distributed.process_group [rank=0] Initialized global process group.
[2026-03-19 00:46:27.038] 57173 INFO     pplx_garden.distributed.process_group [rank=1] Initialized global process group.
[2026-03-19 00:46:27.038] 57175 INFO     pplx_garden.distributed.process_group [rank=3] Initialized global process group.
[2026-03-19 00:46:27.038] 57174 INFO     pplx_garden.distributed.process_group [rank=2] Initialized global process group.
[Gloo] Rank 0 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 1 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 2 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 3 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[2026-03-19 00:46:27.631] 57173 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (4) + NVLink (2)
[2026-03-19 00:46:27.649] 57172 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (4) + NVLink (2)
[2026-03-19 00:46:27.786] 57175 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (4) + NVLink (2)
[2026-03-19 00:46:27.789] 57174 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (4) + NVLink (2)
[2026-03-19 00:46:28.199] 57172 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:46:28.199] 57173 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:46:28.202] 57175 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:46:28.203] 57174 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:46:29.603] 57173 INFO     pplx_garden.distributed.process_group [rank=1] Destroyed global process group.
[2026-03-19 00:46:29.613] 57175 INFO     pplx_garden.distributed.process_group [rank=3] Destroyed global process group.
[2026-03-19 00:46:29.613] 57172 INFO     pplx_garden.distributed.process_group [rank=0] Destroyed global process group.
[2026-03-19 00:46:29.613] 57174 INFO     pplx_garden.distributed.process_group [rank=2] Destroyed global process group.
PASSED
tests/p2p_all_to_all/test_p2p_all_to_all.py::test_p2p_all_to_all[TP8-BF16] [2026-03-19 00:46:33.982] 57586 INFO     pplx_garden.distributed.process_group [rank=1] Initializing global process group. device=cuda:1, init_method=file:///tmp/tmplqpmxs5x/pplx_garden_parallel_init, world_size=8
[2026-03-19 00:46:34.271] 57585 INFO     pplx_garden.distributed.process_group [rank=0] Initializing global process group. device=cuda:0, init_method=file:///tmp/tmplqpmxs5x/pplx_garden_parallel_init, world_size=8
[2026-03-19 00:46:34.602] 57590 INFO     pplx_garden.distributed.process_group [rank=5] Initializing global process group. device=cuda:5, init_method=file:///tmp/tmplqpmxs5x/pplx_garden_parallel_init, world_size=8
[2026-03-19 00:46:34.711] 57587 INFO     pplx_garden.distributed.process_group [rank=2] Initializing global process group. device=cuda:2, init_method=file:///tmp/tmplqpmxs5x/pplx_garden_parallel_init, world_size=8
[2026-03-19 00:46:34.858] 57592 INFO     pplx_garden.distributed.process_group [rank=7] Initializing global process group. device=cuda:7, init_method=file:///tmp/tmplqpmxs5x/pplx_garden_parallel_init, world_size=8
[2026-03-19 00:46:34.998] 57588 INFO     pplx_garden.distributed.process_group [rank=3] Initializing global process group. device=cuda:3, init_method=file:///tmp/tmplqpmxs5x/pplx_garden_parallel_init, world_size=8
[2026-03-19 00:46:35.149] 57591 INFO     pplx_garden.distributed.process_group [rank=6] Initializing global process group. device=cuda:6, init_method=file:///tmp/tmplqpmxs5x/pplx_garden_parallel_init, world_size=8
[2026-03-19 00:46:35.197] 57589 INFO     pplx_garden.distributed.process_group [rank=4] Initializing global process group. device=cuda:4, init_method=file:///tmp/tmplqpmxs5x/pplx_garden_parallel_init, world_size=8
[Gloo] Rank 7 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[Gloo] Rank 1 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[Gloo] Rank 0 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[Gloo] Rank 2 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[Gloo] Rank 3 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[Gloo] Rank 4 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[Gloo] Rank 6 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[Gloo] Rank 5 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[2026-03-19 00:46:38.888] 57585 INFO     pplx_garden.distributed.process_group [rank=0] Initialized global process group.
[2026-03-19 00:46:38.888] 57590 INFO     pplx_garden.distributed.process_group [rank=5] Initialized global process group.
[2026-03-19 00:46:38.888] 57586 INFO     pplx_garden.distributed.process_group [rank=1] Initialized global process group.
[2026-03-19 00:46:38.888] 57588 INFO     pplx_garden.distributed.process_group [rank=3] Initialized global process group.
[2026-03-19 00:46:38.888] 57592 INFO     pplx_garden.distributed.process_group [rank=7] Initialized global process group.
[2026-03-19 00:46:38.888] 57589 INFO     pplx_garden.distributed.process_group [rank=4] Initialized global process group.
[2026-03-19 00:46:38.888] 57591 INFO     pplx_garden.distributed.process_group [rank=6] Initialized global process group.
[2026-03-19 00:46:38.888] 57587 INFO     pplx_garden.distributed.process_group [rank=2] Initialized global process group.
[Gloo] Rank 0 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[Gloo] Rank 2 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[Gloo] Rank 3 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[Gloo] Rank 1 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[Gloo] Rank 4 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[Gloo] Rank 5 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[Gloo] Rank 7 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[Gloo] Rank 6 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[2026-03-19 00:46:39.573] 57585 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (8) + NVLink (1)
[2026-03-19 00:46:39.575] 57587 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (8) + NVLink (1)
[2026-03-19 00:46:39.576] 57589 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (8) + NVLink (1)
[2026-03-19 00:46:39.579] 57590 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (8) + NVLink (1)
[2026-03-19 00:46:39.583] 57591 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (8) + NVLink (1)
[2026-03-19 00:46:39.599] 57592 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (8) + NVLink (1)
[2026-03-19 00:46:39.601] 57588 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (8) + NVLink (1)
[2026-03-19 00:46:39.602] 57586 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (8) + NVLink (1)
[2026-03-19 00:46:41.806] 57586 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:46:41.833] 57587 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:46:41.854] 57591 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:46:41.859] 57590 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:46:41.872] 57592 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:46:41.886] 57585 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:46:41.889] 57588 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:46:42.356] 57589 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:46:43.441] 57586 INFO     pplx_garden.distributed.process_group [rank=1] Destroyed global process group.
[2026-03-19 00:46:43.471] 57592 INFO     pplx_garden.distributed.process_group [rank=7] Destroyed global process group.
[2026-03-19 00:46:43.471] 57585 INFO     pplx_garden.distributed.process_group [rank=0] Destroyed global process group.
[2026-03-19 00:46:43.492] 57587 INFO     pplx_garden.distributed.process_group [rank=2] Destroyed global process group.
[2026-03-19 00:46:43.502] 57588 INFO     pplx_garden.distributed.process_group [rank=3] Destroyed global process group.
[2026-03-19 00:46:43.512] 57589 INFO     pplx_garden.distributed.process_group [rank=4] Destroyed global process group.
[2026-03-19 00:46:43.512] 57591 INFO     pplx_garden.distributed.process_group [rank=6] Destroyed global process group.
[2026-03-19 00:46:43.512] 57590 INFO     pplx_garden.distributed.process_group [rank=5] Destroyed global process group.
PASSED
tests/p2p_all_to_all/test_p2p_all_to_all.py::test_p2p_all_to_all[TP8-FP8] [2026-03-19 00:46:48.581] 58505 INFO     pplx_garden.distributed.process_group [rank=7] Initializing global process group. device=cuda:7, init_method=file:///tmp/tmpaxutqjvc/pplx_garden_parallel_init, world_size=8
[2026-03-19 00:46:48.767] 58498 INFO     pplx_garden.distributed.process_group [rank=0] Initializing global process group. device=cuda:0, init_method=file:///tmp/tmpaxutqjvc/pplx_garden_parallel_init, world_size=8
[2026-03-19 00:46:49.084] 58503 INFO     pplx_garden.distributed.process_group [rank=5] Initializing global process group. device=cuda:5, init_method=file:///tmp/tmpaxutqjvc/pplx_garden_parallel_init, world_size=8
[2026-03-19 00:46:49.129] 58501 INFO     pplx_garden.distributed.process_group [rank=3] Initializing global process group. device=cuda:3, init_method=file:///tmp/tmpaxutqjvc/pplx_garden_parallel_init, world_size=8
[2026-03-19 00:46:49.354] 58500 INFO     pplx_garden.distributed.process_group [rank=2] Initializing global process group. device=cuda:2, init_method=file:///tmp/tmpaxutqjvc/pplx_garden_parallel_init, world_size=8
[2026-03-19 00:46:49.354] 58502 INFO     pplx_garden.distributed.process_group [rank=4] Initializing global process group. device=cuda:4, init_method=file:///tmp/tmpaxutqjvc/pplx_garden_parallel_init, world_size=8
[2026-03-19 00:46:49.508] 58499 INFO     pplx_garden.distributed.process_group [rank=1] Initializing global process group. device=cuda:1, init_method=file:///tmp/tmpaxutqjvc/pplx_garden_parallel_init, world_size=8
[2026-03-19 00:46:49.609] 58504 INFO     pplx_garden.distributed.process_group [rank=6] Initializing global process group. device=cuda:6, init_method=file:///tmp/tmpaxutqjvc/pplx_garden_parallel_init, world_size=8
[Gloo] Rank 0 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[Gloo] Rank 1 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[Gloo] Rank 3 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[Gloo] Rank 2 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[Gloo] Rank 4 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[Gloo] Rank 5 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[Gloo] Rank 7 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[Gloo] Rank 6 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[2026-03-19 00:46:53.288] 58501 INFO     pplx_garden.distributed.process_group [rank=3] Initialized global process group.
[2026-03-19 00:46:53.288] 58499 INFO     pplx_garden.distributed.process_group [rank=1] Initialized global process group.
[2026-03-19 00:46:53.288] 58503 INFO     pplx_garden.distributed.process_group [rank=5] Initialized global process group.
[2026-03-19 00:46:53.288] 58504 INFO     pplx_garden.distributed.process_group [rank=6] Initialized global process group.
[2026-03-19 00:46:53.288] 58498 INFO     pplx_garden.distributed.process_group [rank=0] Initialized global process group.
[2026-03-19 00:46:53.288] 58502 INFO     pplx_garden.distributed.process_group [rank=4] Initialized global process group.
[2026-03-19 00:46:53.288] 58505 INFO     pplx_garden.distributed.process_group [rank=7] Initialized global process group.
[2026-03-19 00:46:53.288] 58500 INFO     pplx_garden.distributed.process_group [rank=2] Initialized global process group.
[Gloo] Rank 0 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[Gloo] Rank 2 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[Gloo] Rank 1 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[Gloo] Rank 3 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[Gloo] Rank 4 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[Gloo] Rank 6 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[Gloo] Rank 5 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[Gloo] Rank 7 is connected to 7 peer ranks. Expected number of connected peer ranks is : 7
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[2026-03-19 00:46:53.983] 58503 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (8) + NVLink (1)
[2026-03-19 00:46:53.985] 58498 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (8) + NVLink (1)
[2026-03-19 00:46:53.989] 58504 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (8) + NVLink (1)
[2026-03-19 00:46:53.994] 58502 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (8) + NVLink (1)
[2026-03-19 00:46:53.998] 58501 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (8) + NVLink (1)
[2026-03-19 00:46:53.999] 58500 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (8) + NVLink (1)
[2026-03-19 00:46:54.000] 58499 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (8) + NVLink (1)
[2026-03-19 00:46:54.011] 58505 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (8) + NVLink (1)
[2026-03-19 00:46:57.938] 58500 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:46:57.969] 58501 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:46:57.972] 58504 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:46:57.989] 58505 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:46:57.995] 58499 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:46:57.996] 58502 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:46:58.051] 58503 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:46:58.506] 58498 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:46:59.899] 58498 INFO     pplx_garden.distributed.process_group [rank=0] Destroyed global process group.
[2026-03-19 00:46:59.909] 58499 INFO     pplx_garden.distributed.process_group [rank=1] Destroyed global process group.
[2026-03-19 00:46:59.909] 58501 INFO     pplx_garden.distributed.process_group [rank=3] Destroyed global process group.
[2026-03-19 00:46:59.909] 58500 INFO     pplx_garden.distributed.process_group [rank=2] Destroyed global process group.
[2026-03-19 00:46:59.922] 58503 INFO     pplx_garden.distributed.process_group [rank=5] Destroyed global process group.
[2026-03-19 00:46:59.922] 58504 INFO     pplx_garden.distributed.process_group [rank=6] Destroyed global process group.
[2026-03-19 00:46:59.922] 58502 INFO     pplx_garden.distributed.process_group [rank=4] Destroyed global process group.
[2026-03-19 00:46:59.922] 58505 INFO     pplx_garden.distributed.process_group [rank=7] Destroyed global process group.
PASSED
tests/p2p_all_to_all/test_p2p_all_to_all.py::test_p2p_all_to_all[TP4-FP8-NVL2] [2026-03-19 00:47:03.947] 59412 INFO     pplx_garden.distributed.process_group [rank=1] Initializing global process group. device=cuda:1, init_method=file:///tmp/tmp81r7p9i2/pplx_garden_parallel_init, world_size=4
[2026-03-19 00:47:03.965] 59411 INFO     pplx_garden.distributed.process_group [rank=0] Initializing global process group. device=cuda:0, init_method=file:///tmp/tmp81r7p9i2/pplx_garden_parallel_init, world_size=4
[2026-03-19 00:47:03.990] 59413 INFO     pplx_garden.distributed.process_group [rank=2] Initializing global process group. device=cuda:2, init_method=file:///tmp/tmp81r7p9i2/pplx_garden_parallel_init, world_size=4
[2026-03-19 00:47:04.263] 59414 INFO     pplx_garden.distributed.process_group [rank=3] Initializing global process group. device=cuda:3, init_method=file:///tmp/tmp81r7p9i2/pplx_garden_parallel_init, world_size=4
[Gloo] Rank 1 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 2 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 0 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 3 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[2026-03-19 00:47:06.689] 59411 INFO     pplx_garden.distributed.process_group [rank=0] Initialized global process group.
[2026-03-19 00:47:06.689] 59414 INFO     pplx_garden.distributed.process_group [rank=3] Initialized global process group.
[2026-03-19 00:47:06.689] 59412 INFO     pplx_garden.distributed.process_group [rank=1] Initialized global process group.
[2026-03-19 00:47:06.689] 59413 INFO     pplx_garden.distributed.process_group [rank=2] Initialized global process group.
[Gloo] Rank 0 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 1 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 3 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 2 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[2026-03-19 00:47:07.460] 59412 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (4) + NVLink (2)
[2026-03-19 00:47:07.471] 59411 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (4) + NVLink (2)
[2026-03-19 00:47:07.561] 59414 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (4) + NVLink (2)
[2026-03-19 00:47:07.571] 59413 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (4) + NVLink (2)
[2026-03-19 00:47:09.546] 59411 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:47:09.566] 59412 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:47:09.599] 59413 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:47:09.671] 59414 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:47:11.288] 59412 INFO     pplx_garden.distributed.process_group [rank=1] Destroyed global process group.
[2026-03-19 00:47:11.288] 59411 INFO     pplx_garden.distributed.process_group [rank=0] Destroyed global process group.
[2026-03-19 00:47:11.298] 59414 INFO     pplx_garden.distributed.process_group [rank=3] Destroyed global process group.
[2026-03-19 00:47:11.301] 59413 INFO     pplx_garden.distributed.process_group [rank=2] Destroyed global process group.
PASSED
tests/p2p_all_to_all/test_p2p_all_to_all.py::test_p2p_all_to_all[TP4-DP2-NVL2] [2026-03-19 00:47:14.412] 59887 INFO     pplx_garden.distributed.process_group [rank=3] Initializing global process group. device=cuda:3, init_method=file:///tmp/tmps2xnty9g/pplx_garden_parallel_init, world_size=4
[2026-03-19 00:47:14.900] 59886 INFO     pplx_garden.distributed.process_group [rank=2] Initializing global process group. device=cuda:2, init_method=file:///tmp/tmps2xnty9g/pplx_garden_parallel_init, world_size=4
[2026-03-19 00:47:15.004] 59885 INFO     pplx_garden.distributed.process_group [rank=1] Initializing global process group. device=cuda:1, init_method=file:///tmp/tmps2xnty9g/pplx_garden_parallel_init, world_size=4
[2026-03-19 00:47:15.004] 59884 INFO     pplx_garden.distributed.process_group [rank=0] Initializing global process group. device=cuda:0, init_method=file:///tmp/tmps2xnty9g/pplx_garden_parallel_init, world_size=4
[Gloo] Rank 2 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 0 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 1 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 3 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[2026-03-19 00:47:17.512] 59884 INFO     pplx_garden.distributed.process_group [rank=0] Initialized global process group.
[2026-03-19 00:47:17.512] 59885 INFO     pplx_garden.distributed.process_group [rank=1] Initialized global process group.
[2026-03-19 00:47:17.512] 59887 INFO     pplx_garden.distributed.process_group [rank=3] Initialized global process group.
[2026-03-19 00:47:17.512] 59886 INFO     pplx_garden.distributed.process_group [rank=2] Initialized global process group.
[Gloo] Rank 0 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 1 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 3 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 2 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[2026-03-19 00:47:18.569] 59885 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (4) + NVLink (2)
[2026-03-19 00:47:18.577] 59884 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (4) + NVLink (2)
[2026-03-19 00:47:18.694] 59886 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (4) + NVLink (2)
[2026-03-19 00:47:18.695] 59887 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (4) + NVLink (2)
[2026-03-19 00:47:19.122] 59885 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:47:19.125] 59884 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:47:19.131] 59886 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:47:19.131] 59887 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:47:20.588] 59887 INFO     pplx_garden.distributed.process_group [rank=3] Destroyed global process group.
[2026-03-19 00:47:20.609] 59886 INFO     pplx_garden.distributed.process_group [rank=2] Destroyed global process group.
[2026-03-19 00:47:20.609] 59884 INFO     pplx_garden.distributed.process_group [rank=0] Destroyed global process group.
[2026-03-19 00:47:20.609] 59885 INFO     pplx_garden.distributed.process_group [rank=1] Destroyed global process group.
PASSED
tests/p2p_all_to_all/test_p2p_all_to_all.py::test_p2p_all_to_all[TP2-EMPTY] [2026-03-19 00:47:23.360] 60346 INFO     pplx_garden.distributed.process_group [rank=1] Initializing global process group. device=cuda:1, init_method=file:///tmp/tmp3y2hehlr/pplx_garden_parallel_init, world_size=2
[2026-03-19 00:47:23.362] 60345 INFO     pplx_garden.distributed.process_group [rank=0] Initializing global process group. device=cuda:0, init_method=file:///tmp/tmp3y2hehlr/pplx_garden_parallel_init, world_size=2
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[2026-03-19 00:47:24.968] 60346 INFO     pplx_garden.distributed.process_group [rank=1] Initialized global process group.
[2026-03-19 00:47:24.968] 60345 INFO     pplx_garden.distributed.process_group [rank=0] Initialized global process group.
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[2026-03-19 00:47:25.191] 60345 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (2) + NVLink (1)
[2026-03-19 00:47:25.213] 60346 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (2) + NVLink (1)
[2026-03-19 00:47:25.441] 60346 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:47:25.445] 60345 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:47:25.993] 60345 INFO     pplx_garden.distributed.process_group [rank=0] Destroyed global process group.
[2026-03-19 00:47:25.993] 60346 INFO     pplx_garden.distributed.process_group [rank=1] Destroyed global process group.
PASSED
tests/p2p_all_to_all/test_p2p_all_to_all.py::test_p2p_all_to_all[TP2-NIC1-FP32-FACTOR2X] [2026-03-19 00:47:28.509] 60551 INFO     pplx_garden.distributed.process_group [rank=1] Initializing global process group. device=cuda:1, init_method=file:///tmp/tmpae820bom/pplx_garden_parallel_init, world_size=2
[2026-03-19 00:47:28.529] 60550 INFO     pplx_garden.distributed.process_group [rank=0] Initializing global process group. device=cuda:0, init_method=file:///tmp/tmpae820bom/pplx_garden_parallel_init, world_size=2
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[2026-03-19 00:47:30.130] 60551 INFO     pplx_garden.distributed.process_group [rank=1] Initialized global process group.
[2026-03-19 00:47:30.130] 60550 INFO     pplx_garden.distributed.process_group [rank=0] Initialized global process group.
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[2026-03-19 00:47:30.364] 60550 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (2) + NVLink (1)
[2026-03-19 00:47:30.387] 60551 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (2) + NVLink (1)
[2026-03-19 00:47:30.623] 60550 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:47:30.628] 60551 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:47:31.159] 60550 INFO     pplx_garden.distributed.process_group [rank=0] Destroyed global process group.
[2026-03-19 00:47:31.159] 60551 INFO     pplx_garden.distributed.process_group [rank=1] Destroyed global process group.
PASSED
tests/p2p_all_to_all/test_p2p_all_to_all.py::test_p2p_all_to_all[TP2-NIC1-BF16-PADDED-FACTOR1.5X] [2026-03-19 00:47:33.675] 60755 INFO     pplx_garden.distributed.process_group [rank=0] Initializing global process group. device=cuda:0, init_method=file:///tmp/tmpdln_zwlz/pplx_garden_parallel_init, world_size=2
[2026-03-19 00:47:33.695] 60756 INFO     pplx_garden.distributed.process_group [rank=1] Initializing global process group. device=cuda:1, init_method=file:///tmp/tmpdln_zwlz/pplx_garden_parallel_init, world_size=2
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[2026-03-19 00:47:35.202] 60756 INFO     pplx_garden.distributed.process_group [rank=1] Initialized global process group.
[2026-03-19 00:47:35.202] 60755 INFO     pplx_garden.distributed.process_group [rank=0] Initialized global process group.
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[2026-03-19 00:47:35.442] 60755 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (2) + NVLink (1)
[2026-03-19 00:47:35.465] 60756 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (2) + NVLink (1)
[2026-03-19 00:47:35.693] 60756 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:47:35.694] 60755 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:47:36.229] 60755 INFO     pplx_garden.distributed.process_group [rank=0] Destroyed global process group.
[2026-03-19 00:47:36.230] 60756 INFO     pplx_garden.distributed.process_group [rank=1] Destroyed global process group.
PASSED
tests/p2p_all_to_all/test_p2p_all_to_all.py::test_p2p_all_to_all[TP2-NIC1-FP32-OVERRIDE80] [2026-03-19 00:47:38.692] 60960 INFO     pplx_garden.distributed.process_group [rank=0] Initializing global process group. device=cuda:0, init_method=file:///tmp/tmpowya966t/pplx_garden_parallel_init, world_size=2
[2026-03-19 00:47:38.693] 60961 INFO     pplx_garden.distributed.process_group [rank=1] Initializing global process group. device=cuda:1, init_method=file:///tmp/tmpowya966t/pplx_garden_parallel_init, world_size=2
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[2026-03-19 00:47:40.259] 60961 INFO     pplx_garden.distributed.process_group [rank=1] Initialized global process group.
[2026-03-19 00:47:40.259] 60960 INFO     pplx_garden.distributed.process_group [rank=0] Initialized global process group.
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[2026-03-19 00:47:40.521] 60960 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (2) + NVLink (1)
[2026-03-19 00:47:40.548] 60961 INFO     pplx_garden.kernels.p2p_all_to_all Setting up RDMA (2) + NVLink (1)
[2026-03-19 00:47:40.782] 60960 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:47:40.787] 60961 INFO     tests.p2p_all_to_all.test_p2p_all_to_all Stopping all-to-all
[2026-03-19 00:47:41.790] 60960 INFO     pplx_garden.distributed.process_group [rank=0] Destroyed global process group.
[2026-03-19 00:47:41.790] 60961 INFO     pplx_garden.distributed.process_group [rank=1] Destroyed global process group.
PASSED

================================================================================= warnings summary =================================================================================
../usr/local/lib/python3.12/dist-packages/_pytest/config/__init__.py:1397
  /usr/local/lib/python3.12/dist-packages/_pytest/config/__init__.py:1397: PytestConfigWarning: Unknown config option: asyncio_default_fixture_loop_scope

    self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
==================================================================== 27 passed, 1 warning in 132.59s (0:02:12) =====================================================================

Add two optional parameters to P2PAllToAll:
- max_recv_tokens: explicit override for recv buffer token capacity
- recv_buffer_factor: multiplier on the balanced-routing estimate

The worst-case default can allocate much more GPU memory than needed,
causing OOM on memory-constrained setups. These parameters allow users
to right-size the buffer while preserving the original default behavior.

Extract buffer sizing into compute_max_recv_tokens() for testability.

Add optional runtime overflow check via PPLX_CHECK_RECV_BUF_USAGE=1.
@crgg1433
Copy link
Copy Markdown
Author

Hi @abcdabcd987,

This PR adds configurable recv buffer sizing logic for P2PAllToAll for best sizing and memory efficiency.

Tests are green. Could you please take a look? Thanks!

@crgg1433
Copy link
Copy Markdown
Author

crgg1433 commented Apr 1, 2026

Hi @abcdabcd987 @nandor, just following up on this PR. It would be great to get your feedback when you have time. Let me know if anything is blocking!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant