Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework the MPI connect/accept code #12586

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

bosilca
Copy link
Member

@bosilca bosilca commented May 29, 2024

I'm puzzled by this code working. It does all the right things, but it was calling PML->add_procs with zero new procs, ALWAYS (the ilist was empty by the time we got to the add_procs call). But somehow it still worked, which seems to indicate that some of the PML are more tolerant to not having procs added correctly. But I'm sure that's not the case for all PML, especially for those requiring symmetric add_procs.

Thus, at this point this PR is a bandaid, trying to do the right thing, at least partially. I think the correct solution is to remove the ilist and always use the mlist (the list of all, local and remote, processes) to do the add_procs, even if we are registering again already known processes. This is the only way to have a symmetric add_proc everywhere.

Add more info when something bad happens during the handshake.

Signed-off-by: George Bosilca <[email protected]>
I still don't think this code is correct, but somehow it works. My main
concern is that the PML add_proc will be called with a different list of
processes, or at least in a different order, everywhere. Before this
patch it worked because we were calling it with ALWAYS 0 procs (because
by the time we got out of the loop the ilist was empty), so it was not
doing much. But, why was it working without a proper add_proc ? Do we
still need that add_proc there ?

Signed-off-by: George Bosilca <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant