-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TL/UCP: Add linear alltoall and allgather algorithms based on xgvmi ucp get #992
base: master
Are you sure you want to change the base?
Conversation
3b8cbf2
to
ecbd9e1
Compare
ecbd9e1
to
a362467
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went over the code with Nick, LGTM
I don't think we should do this now, but these algorithms, including sliding-window AR will not need the allgather in the init function when #909 is merged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, thanks! I still need to go voer the main file tl_ucp_dpu_offload.c
.
Just a first round of minor review in the meantime
@@ -74,6 +75,9 @@ ucc_status_t ucc_tl_ucp_allgather_sparbit_init(ucc_base_coll_args_t *coll_args, | |||
ucc_base_team_t *team, | |||
ucc_coll_task_t **task_h); | |||
|
|||
/* XGVMI */ | |||
void ucc_tl_ucp_dpu_xgvmi_rdma_progress_allgather(ucc_coll_task_t *coll_task); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void ucc_tl_ucp_dpu_xgvmi_rdma_progress_allgather(ucc_coll_task_t *coll_task); | |
void ucc_tl_ucp_dpu_allgather_xgvmi_rdma_progress(ucc_coll_task_t *coll_task); |
to keep a pattern "tl_coll_algo_(init|start|progress|finalize)"
@@ -0,0 +1,73 @@ | |||
/** | |||
* Copyright(c) 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
* Copyright(c) 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |
* Copyright(c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
|
||
req_param.op_attr_mask |= UCP_OP_ATTR_FIELD_MEMH; | ||
|
||
for (i = *posted; i < host_team_size; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible that posted is not 0 when entering this function?
Would it make sense to put this first loop in a "start" function instead?
|
||
ucp_worker_progress(tl_ctx->worker.ucp_worker); | ||
|
||
for (i = *completed; i < *posted; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here *posted
is necessarily equal to host_team_size
right?
@@ -14,6 +14,7 @@ enum { | |||
UCC_TL_UCP_ALLTOALL_ALG_PAIRWISE, | |||
UCC_TL_UCP_ALLTOALL_ALG_BRUCK, | |||
UCC_TL_UCP_ALLTOALL_ALG_ONESIDED, | |||
UCC_TL_UCP_ALLTOALL_ALG_LINEAR_XGVMI, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please fix copyright here, and check in other files as well
@@ -136,6 +136,14 @@ typedef struct ucc_tl_ucp_task { | |||
ucc_ee_executor_task_t *reduce_task; | |||
ucc_tl_ucp_dpu_offload_buf_info_t *bufs; | |||
} allreduce_sliding_window; | |||
struct { | |||
ucc_tl_ucp_allreduce_sw_host_allgather *allgather_data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra spaces
@@ -49,5 +49,22 @@ ucc_status_t ucc_tl_ucp_allreduce_sliding_window_register( | |||
ucp_context_h ucp_context, ucc_tl_ucp_team_t *tl_team, | |||
struct ucc_tl_ucp_allreduce_sw_export_buf *ebuf, void *packed_memh); | |||
|
|||
void ucc_tl_ucp_dpu_xgvmi_free_rkeys( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Declaration could be removed from the header
ucc_coll_task_t *coll_task); | ||
|
||
ucc_status_t | ||
ucc_tl_ucp_dpu_xgvmi_rdma_task_finalize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idem
Can you update the tests as well? |
This PR is a follow up to allreduce sliding window. It adds linear alltoall and allgather algorithms based on XGVMI. They will post ucp gets from host to host in a round robin fashion.