-
Notifications
You must be signed in to change notification settings - Fork 919
COLL/UCC: add persistent collective calls for UCC #13374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
COLL/UCC: add persistent collective calls for UCC #13374
Conversation
Signed-off-by: hasegawa.kento <[email protected]>
Signed-off-by: hasegawa.kento <[email protected]>
Signed-off-by: hasegawa.kento <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have few nitpick comments for the entire commit:
- 'iniz' has little meaning. How about init_common instead ?
- 'a' as a siffix has no meaning while ' alias' does !
- in the original code when the call to the UCC collective _init was made it was multiline with clear separation between send and receive argument. Please maintain that structure, and add the persistent bool flag in front of the ucc_module.
ompi/mca/coll/ucc/coll_ucc.h
Outdated
@@ -61,6 +62,7 @@ struct mca_coll_ucc_component_t { | |||
ucc_lib_attr_t ucc_lib_attr; | |||
ucc_coll_type_t cts_requested; | |||
ucc_coll_type_t nb_cts_requested; | |||
ucc_coll_type_t pc_cts_requested; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does pc stand for persistent collective? maybe ps_cts_requested is better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that ps_cts_requested
is better. I changed pc_cts_requested to ps_cts_requested
.
ompi/mca/coll/ucc/coll_ucc.h
Outdated
@@ -132,6 +134,34 @@ struct mca_coll_ucc_module_t { | |||
mca_coll_base_module_t* previous_scatter_module; | |||
mca_coll_base_module_iscatter_fn_t previous_iscatter; | |||
mca_coll_base_module_t* previous_iscatter_module; | |||
mca_coll_base_module_allreduce_init_fn_t previous_allreduce_init; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please keep alignment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've adjusted the alignment
@@ -57,6 +57,10 @@ static inline ucc_status_t mca_coll_ucc_allgatherv_init(const void *sbuf, size_t | |||
} | |||
}; | |||
|
|||
if (true == persistent) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe make it part of line 37 where other flags are set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed persistent flag to be set at line 37
|
||
if ((bz > 0) && (bp != 0)) { | ||
bp[0] = '\0'; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense to continue this function if bp/bz is null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it.
if (0 == strcmp(cp_suffix, "_init")) { | ||
if ((bz > 0) && (bp != 0)) { | ||
if (blen >= bz) { | ||
return 0 /* XXX internal error */; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return -1 to separate error from other possible results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed to return -1
ompi/mca/coll/ucc/coll_ucc_module.c
Outdated
@@ -87,6 +88,34 @@ static void mca_coll_ucc_module_clear(mca_coll_ucc_module_t *ucc_module) | |||
ucc_module->previous_scatter_module = NULL; | |||
ucc_module->previous_iscatter = NULL; | |||
ucc_module->previous_iscatter_module = NULL; | |||
ucc_module->previous_allreduce_init = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please keep alignment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've adjusted the alignment
@@ -157,18 +215,33 @@ static void mca_coll_ucc_init_default_cts(void) | |||
n_cts = opal_argv_count(cts); | |||
cm->cts_requested = disable ? COLL_UCC_CTS : 0; | |||
cm->nb_cts_requested = disable ? COLL_UCC_CTS : 0; | |||
cm->pc_cts_requested = 0; /* XXX PC currently disabled by default */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure how it supposed to work, maybe I missing something. It seems like disable mode doesn't work for persistent colls as for other. If user gives ^allreduce_init it means enable every persistent collective except allreduce, however since cts_requested is 0 initialy it won't change anything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it so that ^allreduce_init works correctly.
ompi/mca/coll/ucc/coll_ucc_module.c
Outdated
@@ -530,11 +565,32 @@ mca_coll_ucc_module_disable(mca_coll_base_module_t *module, | |||
UCC_UNINSTALL_COLL_API(comm, ucc_module, reduce); | |||
UCC_UNINSTALL_COLL_API(comm, ucc_module, ireduce); | |||
UCC_UNINSTALL_COLL_API(comm, ucc_module, gather); | |||
/* UCC_UNINSTALL_COLL_API(comm, ucc_module, igather); */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was originally missing statement, but I've added a comment as I'm unsure if it's a bug.
I uncommented for missing statement.
ompi/mca/coll/ucc/coll_ucc_module.c
Outdated
if (true != coll_req->super.req_persistent) { | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't it an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I added error handling
Signed-off-by: hasegawa.kento <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor consistency issue, but otherwise this is ready for merge.
Signed-off-by: hasegawa.kento <[email protected]>
What
In addition to the existing blocking and non-blocking collective calls,
this PR adds persistent collective call in COLL/UCC.
For example,
How
This adds <coll>_init function for each collective operation,
using UCC_COLL_ARGS_FLAG_PERSISTENT flag [1].
Reference
[1] "Unified Collective Communications (UCC) Specification Version 1.0" (2022/02/18)
https://openucx.github.io/ucc/api/v1.0/pdf/ucc.pdf
- UCC_COLL_ARGS_FLAG_PERSISTENT in Section 8.8.4.2 "ucc_coll_args_flags_t"