Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TL/UCP: Add linear alltoall and allgather algorithms based on xgvmi ucp get #992

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

nsarka
Copy link
Collaborator

@nsarka nsarka commented Jun 24, 2024

This PR is a follow up to allreduce sliding window. It adds linear alltoall and allgather algorithms based on XGVMI. They will post ucp gets from host to host in a round robin fashion.

@nsarka nsarka marked this pull request as draft June 25, 2024 21:31
@nsarka nsarka changed the title TL/UCP: Add xgvmi allgather TL/UCP: Add linear alltoall and allgather algorithms based on xgvmi ucp get Jun 26, 2024
@nsarka nsarka marked this pull request as ready for review June 26, 2024 17:11
Copy link
Collaborator

@janjust janjust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went over the code with Nick, LGTM

@janjust
Copy link
Collaborator

janjust commented Jun 26, 2024

I don't think we should do this now, but these algorithms, including sliding-window AR will not need the allgather in the init function when #909 is merged

Copy link
Collaborator

@samnordmann samnordmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, thanks! I still need to go voer the main file tl_ucp_dpu_offload.c.

Just a first round of minor review in the meantime

@@ -74,6 +75,9 @@ ucc_status_t ucc_tl_ucp_allgather_sparbit_init(ucc_base_coll_args_t *coll_args,
ucc_base_team_t *team,
ucc_coll_task_t **task_h);

/* XGVMI */
void ucc_tl_ucp_dpu_xgvmi_rdma_progress_allgather(ucc_coll_task_t *coll_task);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void ucc_tl_ucp_dpu_xgvmi_rdma_progress_allgather(ucc_coll_task_t *coll_task);
void ucc_tl_ucp_dpu_allgather_xgvmi_rdma_progress(ucc_coll_task_t *coll_task);

to keep a pattern "tl_coll_algo_(init|start|progress|finalize)"

@@ -0,0 +1,73 @@
/**
* Copyright(c) 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
* Copyright(c) 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright(c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.


req_param.op_attr_mask |= UCP_OP_ATTR_FIELD_MEMH;

for (i = *posted; i < host_team_size; i++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible that posted is not 0 when entering this function?
Would it make sense to put this first loop in a "start" function instead?


ucp_worker_progress(tl_ctx->worker.ucp_worker);

for (i = *completed; i < *posted; i++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here *posted is necessarily equal to host_team_size right?

@@ -14,6 +14,7 @@ enum {
UCC_TL_UCP_ALLTOALL_ALG_PAIRWISE,
UCC_TL_UCP_ALLTOALL_ALG_BRUCK,
UCC_TL_UCP_ALLTOALL_ALG_ONESIDED,
UCC_TL_UCP_ALLTOALL_ALG_LINEAR_XGVMI,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix copyright here, and check in other files as well

@@ -136,6 +136,14 @@ typedef struct ucc_tl_ucp_task {
ucc_ee_executor_task_t *reduce_task;
ucc_tl_ucp_dpu_offload_buf_info_t *bufs;
} allreduce_sliding_window;
struct {
ucc_tl_ucp_allreduce_sw_host_allgather *allgather_data;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra spaces

@@ -49,5 +49,22 @@ ucc_status_t ucc_tl_ucp_allreduce_sliding_window_register(
ucp_context_h ucp_context, ucc_tl_ucp_team_t *tl_team,
struct ucc_tl_ucp_allreduce_sw_export_buf *ebuf, void *packed_memh);

void ucc_tl_ucp_dpu_xgvmi_free_rkeys(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declaration could be removed from the header

ucc_coll_task_t *coll_task);

ucc_status_t
ucc_tl_ucp_dpu_xgvmi_rdma_task_finalize(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem

@samnordmann
Copy link
Collaborator

Can you update the tests as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants