-
Notifications
You must be signed in to change notification settings - Fork 858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fs/ufs: change default locking protocol #12756
fs/ufs: change default locking protocol #12756
Conversation
The fs/ufs component by default disabled all file locking before read/write operations (except for NFS file systems). This was based on the assumption, that the file system itself performs the required locking operation and hence we don't have to add to it. This assumption is incorrect when using data sieving. In data sieving, the code 'ignore' small gaps when we write to a file, and perform instead a read-modify-write sequence ourselves for performance reasons. The problem is however that even within a collective operation not all aggregators might want to use data sieving. Hence, enabling locking just for the data-sieving routines is insufficient, all processes have to perform the locking. Therefore, our two options are: a) either disable write data-sieving by default, or b) enable range-locking by default. After some testing, I think enabling range-locking be default is the safer and better approach. It doesn't seem to show any significant performance impact on my test systems. Note, that on Lustre file systems, we can keep the default to no-locking as far as I can see, since the collective algorithm used by Lustre is unlikely to produce this pattern. I did add in however an mca parameter that allows us to control the locking algorithm used by the Lustre component as well, in case we need to change that for a particular use-case or platform. Fixes Issue open-mpi#12718 Signed-off-by: Edgar Gabriel <[email protected]>
@@ -93,6 +95,15 @@ lustre_register(void) | |||
MCA_BASE_VAR_TYPE_INT, NULL, 0, 0, | |||
OPAL_INFO_LVL_9, | |||
MCA_BASE_VAR_SCOPE_READONLY, &mca_fs_lustre_stripe_width); | |||
mca_fs_lustre_lock_algorithm = 0; | |||
(void) mca_base_component_var_register(&mca_fs_lustre_component.fsm_version, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit (feel free to ignore) This could be more user friendly if it was implemented with mca_base_var_enum_t, but requires more code.
} | ||
} | ||
else { | ||
fh->f_flags |= OMPIO_LOCK_NEVER; | ||
} | ||
free (fstype); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not knowing this code path very well, it looks to me like you handled Lustre and NFS just fine. However, is there any case other than NFS and Lustre? Seems that previously such a case would have set OMPIO_LOCK_NEVER but now it would be unset (LOCK_RANGES).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the fs/ufs component handles everything that is not Lustre at the moment (e.g. NFS, local file systems, BeeGFS, etc.). We treat all of them equal except for NFS. Lustre requires special handling because it allows us to modify some additional parameters (stripe size, stripe count, etc.), hence it has its own component. So from the fs/ufs component perspective its either NFS (in which case we activate locking) or everything else (in which case we now do range locking instead of no locking).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot that we also have separate components for GPFS and IME, not sure how much they are in use however. We also used to have separate components for PVFS2/OrangeFS, but that was removed a few weeks ago.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just re-read the commit message. Your intent is to change those paths from LOCK_NEVER to LOCK_RANGE, which is what you have done. 👍
} | ||
} | ||
else { | ||
fh->f_flags |= OMPIO_LOCK_NEVER; | ||
} | ||
free (fstype); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just re-read the commit message. Your intent is to change those paths from LOCK_NEVER to LOCK_RANGE, which is what you have done. 👍
@lrbison thank you! |
The fs/ufs component by default disables all file locking before read/write operations (except for NFS file systems). This was based on the assumption, that the OS itself performs the required locking operation and hence we don't have to add to it.
This assumption is incorrect when using data sieving. In data sieving, the code 'ignores' small gaps when we write to a file, and perform instead a read-modify-write sequence ourselves for performance reasons. The OS can however not protect the gap in the read-modify-write sequence. In addition, within a collective operation not all aggregators might want to use data sieving. Hence, enabling locking just for the data-sieving routines is insufficient, all processes have to perform the locking. Therefore, our two options are: a) either disable write data-sieving by default, or b) enable range-locking by default.
After some testing, I think enabling range-locking be default is the safer and better approach. It doesn't seem to show a significant performance impact on my test systems.
Note, that on Lustre file systems, we can keep the default to no-locking as far as I can see, since the collective algorithm used by Lustre is unlikely (maybe even impossible) to produce this pattern. I did add in however an mca parameter that allows us to control the locking algorithm used by the Lustre component as well, in case we need to change that for a particular use-case or platform.
Fixes Issue #12718