Skip to content

Commit

Permalink
pml/ob1: ensure RDMA fragments are released in the get -> send/recv f…
Browse files Browse the repository at this point in the history
…allback

Under a number of circumstances it may be necessary to abandon an RDMA get in
ob1. In some cases it falls back to put but it may fall back to using send/recv.
If that happens then we may either crash or leak RDMA fragments because they
are still attached to the send request. Debug builds will crash due to a check
on rdma_frag when they are returned. This CL fixes the flaw by releasing any
rdma fragment when sceduling sends.

Signed-off-by: Nathan Hjelm <[email protected]>
  • Loading branch information
hjelmn committed Sep 19, 2024
1 parent 5f00259 commit 020e83f
Showing 1 changed file with 7 additions and 0 deletions.
7 changes: 7 additions & 0 deletions ompi/mca/pml/ob1/pml_ob1_sendreq.c
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
* Copyright (c) 2018-2019 Triad National Security, LLC. All rights
* reserved.
* Copyright (c) 2022 IBM Corporation. All rights reserved.
* Copyright (c) 2024 Google, LLC. All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
Expand Down Expand Up @@ -1110,6 +1111,12 @@ mca_pml_ob1_send_request_schedule_once(mca_pml_ob1_send_request_t* sendreq)

range = get_send_range(sendreq);

if (NULL != sendreq->rdma_frag) {
/* this request was first attempted with RDMA but is now using send/recv */
MCA_PML_OB1_RDMA_FRAG_RETURN(sendreq->rdma_frag);
sendreq->rdma_frag = NULL;
}

while(range && (false == sendreq->req_throttle_sends ||
sendreq->req_pipeline_depth < mca_pml_ob1.send_pipeline_depth)) {
mca_pml_ob1_frag_hdr_t* hdr;
Expand Down

0 comments on commit 020e83f

Please sign in to comment.