Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the NUMA detection for the smcuda BTL. #12391

Merged
merged 1 commit into from
Mar 6, 2024

Conversation

bosilca
Copy link
Member

@bosilca bosilca commented Mar 5, 2024

Code cleanup initiated by a discussion on Slack. If we fail to identify the process NUMA node, or if the process is associated with more than one, print a warning to inform the user about the potential performance implication.

@edgargabriel
Copy link
Member

@bosilca, just curious, is this a bug fix or code cleanup?

@bosilca
Copy link
Member Author

bosilca commented Mar 5, 2024

It all started from a discussion on Slack about a warning in the smcuda BTL pointed out by @wenduwan.

Copy link
Contributor

@wenduwan wenduwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Just 1 minor comment.

opal/mca/btl/smcuda/btl_smcuda.c Show resolved Hide resolved
@wenduwan wenduwan self-requested a review March 6, 2024 01:15
@wenduwan
Copy link
Contributor

wenduwan commented Mar 6, 2024

I tried this patch but still see the warning

btl_smcuda.c: In function ‘smcuda_btl_first_time_init’:
btl_smcuda.c:280:27: warning: ‘i’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  280 |             for (w = 0; w < i; w++) {
      |                         ~~^~~
  CCLD     mca_btl_smcuda.la

@bosilca
Copy link
Member Author

bosilca commented Mar 6, 2024

Fixed, please try again.

@wenduwan
Copy link
Contributor

wenduwan commented Mar 6, 2024

@bosilca Thank you. The warning is fixed.

Now it revealed something else. int i is not useful at all. On L427 I see

 427     i = opal_free_list_init(&mca_btl_smcuda_component.sm_frags_eager, length, opal_cache_line_size,
 428                             OBJ_CLASS(mca_btl_smcuda_frag1_t), length_payload, opal_cache_line_size,
 429                             mca_btl_smcuda_component.sm_free_list_num,
 430                             mca_btl_smcuda_component.sm_free_list_max,
 431                             mca_btl_smcuda_component.sm_free_list_inc,
 432                             mca_btl_smcuda_component.sm_mpool, 0, NULL, NULL, NULL);
 433     if (OPAL_SUCCESS != i)
 434         return i;

i is used instead of rc.

@wenduwan
Copy link
Contributor

wenduwan commented Mar 6, 2024

Thanks @bosilca . I will run some additional tests in the background(I don't expect any issue).

Feel free to merge otherwise.

@bosilca bosilca merged commit 8591403 into open-mpi:main Mar 6, 2024
7 checks passed
@bosilca bosilca deleted the fix/btl_smcuda_topo branch March 6, 2024 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants