Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SHMEM_LOCKS: MCS implementation of SHMEM LOCKS #11796

Merged
merged 1 commit into from
Oct 18, 2023

Conversation

vvenkates27
Copy link
Contributor

@vvenkates27 vvenkates27 commented Jun 30, 2023

Adding MCS algorithm-based implementation for shmem_locks to improve performance for large scale SHMEM applications using locks. MCS lock is now the default algorithm, use the following MCA parameter to disable.
-- Use MCA parameter --mca oshmem_enable_mcs_lock 0 to disable mcs locks and revert to default ticket locking.
-- Use oshmem_api_verbose 10 for debug information on shmem_locks.
@manjugv @brminich @rakhmets @yosefe

Copy link
Member

@brminich brminich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
have some minor comments

@@ -27,5 +28,10 @@

void shmem_clear_lock(volatile long *lock)
{
_shmem_clear_lock((void *)lock, sizeof(long));
if (oshmem_shmem_mcs_locks) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general comment: pls use 4 space tabs as per OMPI Coding Style

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked and fixed all files updated in this PR to stick to 4 space tabs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please double check? E.g. looks like 8-width tabs are used in this file (and possibly others)

@@ -27,5 +28,10 @@

void shmem_clear_lock(volatile long *lock)
{
_shmem_clear_lock((void *)lock, sizeof(long));
if (oshmem_shmem_mcs_locks) {
SHMEM_API_VERBOSE(10, "shmem_clear_lock using MCS Lock");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe move this print to the very beggining of the func and also print locking scheme being cleaned?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved the print and also printing the locking scheme now with the latest commit

Comment on lines 48 to 49
type prefix##_ctx##type_name##_atomic_fetch(shmem_ctx_t ctx, \
const type *target, int pe) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated change, pls remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 272 to 273
MCA_ATOMIC_CALL(cswap(oshmem_ctx_default, target, (void*)&prev_value,
cond, value, target_size, pe));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls consider pushing changes in this file in a separate PR, as they are not relavant for the new locking algo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed unwanted changes from this PR.

Comment on lines 1 to 6
/*
* Copyright (c) 2013 Mellanox Technologies, Inc.
* All rights reserved.
* Copyright (c) 2014 Cisco Systems, Inc. All rights reserved.
* Copyright (c) 2015 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
* $COPYRIGHT$
*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this is newly introduced file, i'd guess only relevant Nvidia copyright notice should be present

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the License

Comment on lines 122 to 123
* shmem_atomic_add(next, my_pe - NEXT_MASK,
prev_tailpe);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we can remove this. also

/**
* Wait for predecessor release lock to this PE
* signal to false.
* int curr = shmem_atomic_fetch(next, my_pe);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, artifact from previous implementation removed.

Comment on lines 199 to 200
* prev_value = shmem_atomic_compare_swap(tail, swap_cond, 0,
* mcs_tail_owner);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it almost the same what is used below?
same comment is relevant for other similar comments in this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed this comment.

Comment on lines 291 to 299
shmem_mcs_internal_set_lock(lock);
}

void _shmem_mcs_clear_lock(long *lock) {
shmem_mcs_internal_clear_lock(lock);
}

int _shmem_mcs_test_lock(long *lock) {
return shmem_mcs_internal_test_lock(lock);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need these wrappers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was intended to keep similar to the existing implementation. Doesn't have purpose for this implementation. Removing it.

@@ -27,5 +28,10 @@

void shmem_set_lock(volatile long *lock)
{
_shmem_set_lock((void *)lock, sizeof(long));
if (oshmem_shmem_mcs_locks) {
SHMEM_API_VERBOSE(10, "shmem_set_lock using MCS Lock");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe such comments are worth to be common for all type of locks? (i.e. to move them to the very beginning of func)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed and moved in the latest commit.

@vvenkates27
Copy link
Contributor Author

vvenkates27 commented Aug 3, 2023

👍 have some minor comments

Thank you for the review. I have fixed all the issues pointed in the comments and uploaded a new patch. Please review and let me know if you have any other comments.

@@ -27,5 +28,10 @@

void shmem_clear_lock(volatile long *lock)
{
_shmem_clear_lock((void *)lock, sizeof(long));
if (oshmem_shmem_mcs_locks) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please double check? E.g. looks like 8-width tabs are used in this file (and possibly others)

/*
* Copyright (c) 2023 NVIDIA Corporation.
* All rights reserved.
* and Technology (RIST). All rights reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it was, removed now.

RUNTIME_CHECK_RC(retv);

/**
* This value to be changed eventually by predecessor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alignment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have redone the alignment for the whole file now, hopefully this patch fixes the alignment issues.

Comment on lines 158 to 160
int next_value = 0;
int swap_cond = 0;
int prev_value = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for extra spaces before =

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

int
_shmem_mcs_test_lock(long *lockp)
{
lock_t *lock = (lock_t *) lockp;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls align by = for consistency with other vars in this func

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 253 to 254
if (0 == prev_tail)
return 0; /** lock acquired successfully */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please use {} for every statement, like

Suggested change
if (0 == prev_tail)
return 0; /** lock acquired successfully */
if (0 == prev_tail) {
return 0; /** lock acquired successfully */
}

(we try to use this style for oshmem and ucx pml code)
also relevant for other one line statements in this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -27,5 +29,12 @@

void shmem_set_lock(volatile long *lock)
{
_shmem_set_lock((void *)lock, sizeof(long));
SHMEM_API_VERBOSE(10, "shmem_set_lock");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe remove this one now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -28,5 +31,12 @@

int shmem_test_lock(volatile long *lock)
{
return _shmem_test_lock((void *)lock, sizeof(long));
SHMEM_API_VERBOSE(10, "shmem_test_lock");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@github-actions
Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

94dae7c: Update shmem_mcs_lock.c - minor alignment for "="

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@github-actions
Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

4b2a05a: Update shmem_mcs_lock.c - aligning a commit

  • check_signed_off: does not contain a valid Signed-off-by line

94dae7c: Update shmem_mcs_lock.c - minor alignment for "="

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@github-actions
Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

923785e: Update shmem_test_lock.c - Remove unwanted prints

  • check_signed_off: does not contain a valid Signed-off-by line

26b3a73: Update shmem_clear_lock.c remove unwanted print.

  • check_signed_off: does not contain a valid Signed-off-by line

4b2a05a: Update shmem_mcs_lock.c - aligning a commit

  • check_signed_off: does not contain a valid Signed-off-by line

94dae7c: Update shmem_mcs_lock.c - minor alignment for "="

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

1 similar comment
@github-actions
Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

923785e: Update shmem_test_lock.c - Remove unwanted prints

  • check_signed_off: does not contain a valid Signed-off-by line

26b3a73: Update shmem_clear_lock.c remove unwanted print.

  • check_signed_off: does not contain a valid Signed-off-by line

4b2a05a: Update shmem_mcs_lock.c - aligning a commit

  • check_signed_off: does not contain a valid Signed-off-by line

94dae7c: Update shmem_mcs_lock.c - minor alignment for "="

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@github-actions
Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

3786663: Update shmem_mcs_lock.c - Some more minor alignmen...

  • check_signed_off: does not contain a valid Signed-off-by line

923785e: Update shmem_test_lock.c - Remove unwanted prints

  • check_signed_off: does not contain a valid Signed-off-by line

26b3a73: Update shmem_clear_lock.c remove unwanted print.

  • check_signed_off: does not contain a valid Signed-off-by line

4b2a05a: Update shmem_mcs_lock.c - aligning a commit

  • check_signed_off: does not contain a valid Signed-off-by line

94dae7c: Update shmem_mcs_lock.c - minor alignment for "="

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@github-actions
Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

5b47412: Update shmem_mcs_lock.c - removed unwanted lines/s...

  • check_signed_off: does not contain a valid Signed-off-by line

3786663: Update shmem_mcs_lock.c - Some more minor alignmen...

  • check_signed_off: does not contain a valid Signed-off-by line

923785e: Update shmem_test_lock.c - Remove unwanted prints

  • check_signed_off: does not contain a valid Signed-off-by line

26b3a73: Update shmem_clear_lock.c remove unwanted print.

  • check_signed_off: does not contain a valid Signed-off-by line

4b2a05a: Update shmem_mcs_lock.c - aligning a commit

  • check_signed_off: does not contain a valid Signed-off-by line

94dae7c: Update shmem_mcs_lock.c - minor alignment for "="

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@github-actions
Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

6936a0e: Commits from github online without signed-off opti...

  • check_signed_off: does not contain a valid Signed-off-by line

5b47412: Update shmem_mcs_lock.c - removed unwanted lines/s...

  • check_signed_off: does not contain a valid Signed-off-by line

3786663: Update shmem_mcs_lock.c - Some more minor alignmen...

  • check_signed_off: does not contain a valid Signed-off-by line

923785e: Update shmem_test_lock.c - Remove unwanted prints

  • check_signed_off: does not contain a valid Signed-off-by line

26b3a73: Update shmem_clear_lock.c remove unwanted print.

  • check_signed_off: does not contain a valid Signed-off-by line

4b2a05a: Update shmem_mcs_lock.c - aligning a commit

  • check_signed_off: does not contain a valid Signed-off-by line

94dae7c: Update shmem_mcs_lock.c - minor alignment for "="

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@brminich
Copy link
Member

@yosefe, pls review

Comment on lines 11 to 13
/**
* @file
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that the comment is not needed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

/** has meaning only on MCSQ_TAIL OWNER */
int tail;
/** It has meaning on all PEs */
/** The next pointer is a combination of the PE ID and wait signal */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/** The next pointer is a combination of the PE ID and wait signal */
/** The next pointer is a combination of the PE ID and wait signal */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines 46 to 48
#define SIGNAL_MASK 0x80000000U //Wait signal mask
#define NEXT(A) (A & NEXT_MASK)
#define GET_PE(P) (P & NEXT_MASK) // Improve readability
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use C-style comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

* releasing it.
*/
/**
* Can make this to be shmem_atomic_set to be safe in non-cc architecutres
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Can make this to be shmem_atomic_set to be safe in non-cc architecutres
* Can make this to be shmem_atomic_set to be safe in non-cc architectures

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

int retv = 0;

/**
* Can make atomic fetch to be safe in non-cc architecutres
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Can make atomic fetch to be safe in non-cc architecutres
* Can make atomic fetch to be safe in non-cc architectures

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -9,7 +11,6 @@
*
* $HEADER$
*/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please return back the empty line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@rakhmets rakhmets Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to replace sizeof(int) by sizeof(variable_name) in swap, add, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, swap, add calls are internal MCA calls of shmem_int__ calls. sizeof(int) can help with readability than the variable name.

oshmem/runtime/oshmem_shmem_params.c Outdated Show resolved Hide resolved
oshmem/shmem/c/shmem_clear_lock.c Outdated Show resolved Hide resolved
oshmem/shmem/c/shmem_mcs_lock.c Outdated Show resolved Hide resolved
oshmem/shmem/c/shmem_mcs_lock.c Outdated Show resolved Hide resolved
oshmem/shmem/c/shmem_mcs_lock.c Outdated Show resolved Hide resolved
oshmem/shmem/c/shmem_mcs_lock.c Outdated Show resolved Hide resolved
oshmem/shmem/c/shmem_mcs_lock.c Outdated Show resolved Hide resolved
oshmem/shmem/c/shmem_mcs_lock.c Outdated Show resolved Hide resolved
oshmem/shmem/c/shmem_mcs_lock.c Outdated Show resolved Hide resolved
@brminich
Copy link
Member

@vvenkates27 , can you please squash the commits?

Adding MCS algorithm-based implementation for shmem_locks
to improve performance for large scale SHMEM applications using locks.
MCS lock is now the default algorithm, use the following MCA parameter
to disable. --mca oshmem_enable_mcs_lock 0 to disable mcs locks and
revert to default ticket locking. --mca oshmem_api_verbose 10 for debug
information on shmem_locks.

Signed-off-by: Vishwanath Venkatesan <[email protected]>
@brminich brminich merged commit 67026ef into open-mpi:main Oct 18, 2023
9 of 10 checks passed
@brminich
Copy link
Member

@vvenkates27 can you please port it to 4.1.x and 5.0 branches?

vvenkates27 pushed a commit to vvenkates27/ompi that referenced this pull request Oct 18, 2023
SHMEM_LOCKS: MCS implementation of SHMEM LOCKS

Adding MCS algorithm-based implementation for shmem_locks
to improve performance for large scale SHMEM applications using locks.
MCS lock is now the default algorithm, use the following MCA parameter
to disable. --mca oshmem_enable_mcs_lock 0 to disable mcs locks and
revert to default ticket locking. --mca oshmem_api_verbose 10 for debug
information on shmem_locks.

Signed-off-by: Vishwanath Venkatesan <[email protected]>
@vvenkates27
Copy link
Contributor Author

@vvenkates27 can you please port it to 4.1.x and 5.0 branches?

Submitted pull requests for both branches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants