Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare ompio frameworks for bigcount #12361

Merged
merged 2 commits into from
Mar 4, 2024

Conversation

jtronge
Copy link
Contributor

@jtronge jtronge commented Feb 22, 2024

This updates ompio-related frameworks and components to use size_t instead of int for counts.

I updated the version in the io framework from 2.0.0 to 3.0.0. fcoll and sharedfp both seem to be using different versions for the component and module structs, so I wasn't quite sure what version to use for those.

Update io/ompio, common/ompio, fcoll, and sharedfp frameworks and
components to use size_t instead of int for count types.

Signed-off-by: Jake Tronge <[email protected]>
@edgargabriel
Copy link
Member

I can confirm that this PR is correct in terms of high level interface changes. We also still pass our testsuites. However, I am 99% sure that we will need to add more work to actually make bigcount work, including probably adding at least some tests to our testsuite.

So if the plan is to get the interface changes in and work after that component by component to upgrade them to support bigcount (and I am happy to help with that), I would be ok with this pr.

@hppritcha
Copy link
Member

@edgargabriel do you approve this PR?

@hppritcha
Copy link
Member

@edgargabriel could you advise us on the module/component versioning? we think these should be change in fcoll and sharedfp?

@edgargabriel
Copy link
Member

@hppritcha I think you are correct in that we should probably increase the module/component versions in the fcoll and sharedfp frameworks. Ping me if you make that change, and I will rerun our tests before merging again

@hppritcha
Copy link
Member

@edgargabriel could you double check @jtronge 's latest commit ?

Copy link
Member

@edgargabriel edgargabriel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have removed all type-casts associated with mallocs, please add them back, e.g.

- sizes_old_group = (int*)malloc(num_merge_aggrs * sizeof(int));
+ sizes_old_group = malloc(num_merge_aggrs * sizeof(size_t));

please leave it as

+ sizes_old_group = (size_t *)malloc(num_merge_aggrs * sizeof(size_t)); 

throughout the code base. I will re-review once that is fixed everywhere, but I would have to add too many messages/comments right now because of this issue.

@jtronge
Copy link
Contributor Author

jtronge commented Feb 29, 2024

@edgargabriel I've added back all the malloc type-casts.

When I was changing the versions in fcoll and sharedfp I noticed that I had missed some of the collective calls in fcoll/base, and that's what the majority of these additional changes are.

@edgargabriel
Copy link
Member

@jtronge thank you, I think it looks good. I will run tomorrow morning our testsuites on this branch, and will let you know after that whether the tests expose something in addition.

@edgargabriel
Copy link
Member

edgargabriel commented Mar 1, 2024

Tests look good, I think we are good to merge this pr. Thank you for this work @jtronge !

@wenduwan
Copy link
Contributor

wenduwan commented Mar 1, 2024

After merging the PR we should watch MTT for a while. If often catches corners cases that we don't normally test.

@wenduwan wenduwan closed this Mar 1, 2024
@wenduwan wenduwan reopened this Mar 1, 2024
@wenduwan
Copy link
Contributor

wenduwan commented Mar 1, 2024

(Clicked wrong button)

@hppritcha
Copy link
Member

@edgargabriel could you click the review again?

@hppritcha
Copy link
Member

canceled NVIDA CI as it seems to be unable to recover from power outage any time soon.

@hppritcha hppritcha merged commit 067545a into open-mpi:main Mar 4, 2024
11 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants