feat: add configurable recv buffer sizing for P2PAllToAll#10
Open
crgg1433 wants to merge 1 commit intoperplexityai:mainfrom
Open
feat: add configurable recv buffer sizing for P2PAllToAll#10crgg1433 wants to merge 1 commit intoperplexityai:mainfrom
crgg1433 wants to merge 1 commit intoperplexityai:mainfrom
Conversation
Add two optional parameters to P2PAllToAll: - max_recv_tokens: explicit override for recv buffer token capacity - recv_buffer_factor: multiplier on the balanced-routing estimate The worst-case default can allocate much more GPU memory than needed, causing OOM on memory-constrained setups. These parameters allow users to right-size the buffer while preserving the original default behavior. Extract buffer sizing into compute_max_recv_tokens() for testability. Add optional runtime overflow check via PPLX_CHECK_RECV_BUF_USAGE=1.
Author
|
Hi @abcdabcd987, This PR adds configurable recv buffer sizing logic for P2PAllToAll for best sizing and memory efficiency. Tests are green. Could you please take a look? Thanks! |
Author
|
Hi @abcdabcd987 @nandor, just following up on this PR. It would be great to get your feedback when you have time. Let me know if anything is blocking! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds configurable recv buffer sizing for P2PAllToAll:
max_recv_tokensrecv_buffer_factorThree sizing strategies are supported (in order of priority):
max_recv_tokens_override: explicit override for the recv buffer token capacity. Recommended when the routing behavior is well understood, for best sizing and memory efficiency.recv_buffer_factor: multiplier on the balanced estimate.Usage
An optional runtime buffer utilization check is available via
PPLX_CHECK_RECV_BUF_USAGE=1to help tune buffer sizing (requires a d2h sync, so it is off by default). It logs the percentage of buffer utilization per rank, helping identify how much the buffer can be further shrunk.The buffer sizing logic is extracted into a standalone
compute_max_recv_tokens()function for testability.Purpose
The original implementation computes recv buffer size from a worst-case bound that assumes maximally imbalanced routing. This can allocate much more GPU memory (~4x in practice) than actually requires, wasting memory or even causing CUDA OOM at init time on memory-constrained setups.
Default behavior is unchanged — when neither parameter is specified, the original worst-case allocation is used.
Important caveat:
The current one-sided RDMA write path has no graceful overflow handling. If buffer overflows, the Rust RDMA handler will panic code. Token dropping would require changes to the CUDA dispatch kernel and the Rust RDMA layer, which is out of scope here. Users should verify their buffer is sufficient using the overflow check before running production workloads.
Test
Added unit test to cover the buffer sizing logic