Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the info subscriber mechanism and hidden info keys #12498

Merged
merged 1 commit into from
May 3, 2024

Conversation

devreal
Copy link
Contributor

@devreal devreal commented Apr 26, 2024

PR #9246 was overzealous in dropping keys, which broke MPI_[Comm|Win|File]_set_info. Also, @wenduwan noted in #11823 that info keys are not properly propagated to subcommunicators in HAN. The root cause is that keys are not kept if there were no subscribers.

This PR makes the following changes:

  • Make sure info keys are always stored in the info object, independent of whether there are subscribers.
  • If there are no subscribers we mark the keys as internal (previously this was done using a __IN_ prefix to the key, now it's a flag). Internal keys are not handed back to the user in MPI_[Comm|Win|File]_get_info.
  • Remove the function for explicitly managing the reference count and removing unreferenced info keys. We still track whether info keys are referenced and keys that are not referenced are treated like internal keys.

@wenduwan
Copy link
Contributor

I verified that with this PR, HAN's info keys are properly propagated to the subcommunicators.

Copy link
Member

@bosilca bosilca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The need for this PR arise because the info code, lacking documentation, was misunderstood and capabilities were mistakenly removed. For the sake of future OMPI developers we need to properly document these changes.

opal/util/info.c Outdated
@@ -67,12 +67,14 @@ OBJ_CLASS_INSTANCE(opal_info_entry_t, opal_list_item_t, info_entry_constructor,
/*
* Duplicate an info
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a major change, it deserves better documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -85,6 +87,16 @@ int opal_info_dup(opal_info_t *info, opal_info_t **newinfo)
return OPAL_SUCCESS;
}

int opal_info_dup_public(opal_info_t *info, opal_info_t **newinfo)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move these into a header file and make them always inline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant about this since there are no other implementations in info.h. The call to opal_info_dup_impl will be a jmp so there is little to gain from inlining these.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A call into a function that does a simple jmp ? I don't even see the interest of having these functions, with a weird name, instead of just using opal_info_dup with all 3 arguments.

opal/util/info.h Show resolved Hide resolved
opal/util/info.h Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants