-
Notifications
You must be signed in to change notification settings - Fork 857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prepare osc framework for bigcount #12379
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hppritcha! It mostly looks good, just two points:
- I'm afraid all of the verbose output statements need to be adjusted, like:
https://github.com/open-mpi/ompi/pull/12379/files#diff-e4bd30eae59fa8aabf9ee87dcb523aa60a292e7c09d1e84e014c1ae7e13e1e3cR1145 - I think the
disp_units
members of the modules need to becomeptrdiff_t
(currentlyint
) sinceMPI_Win_create_c
takesMPI_Aint
for the displacements.
0a66a0b
to
449e59a
Compare
Running AWS CI |
AWS CI failed due to OMB onesided suite. Will provide more info. |
You mean IMB right? |
Example failure
Backtrace
|
having problems reproducing your findings @wenduwan . what are you config options and compiler used? and also processor type? |
Note that I don't imagine libfabric to be an issue since the failure happens in sm The segfault happens on:
|
Thanks. The test passed. |
@devreal could you take at this PR again to see if your questions were addressed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few places where the format strings need to be adjusted
ompi/mca/osc/rdma/osc_rdma_comm.c
Outdated
@@ -923,7 +923,7 @@ int ompi_osc_rdma_rget (void *origin_addr, int origin_count, ompi_datatype_t *or | |||
ompi_osc_rdma_sync_t *sync; | |||
int ret; | |||
|
|||
OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "rget: 0x%lx, %d, %s, %d, %d, %d, %s, %s", (unsigned long) origin_addr, | |||
OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "rget: 0x%lx, %d, %s, %d, %d, %zu, %s, %s", (unsigned long) origin_addr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
origin_count
is now size_t
:
OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "rget: 0x%lx, %d, %s, %d, %d, %zu, %s, %s", (unsigned long) origin_addr, | |
OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "rget: 0x%lx, %zu, %s, %d, %d, %zu, %s, %s", (unsigned long) origin_addr, |
ompi/mca/osc/rdma/osc_rdma_comm.c
Outdated
ompi_datatype_t *source_datatype, ompi_win_t *win) | ||
{ | ||
ompi_osc_rdma_module_t *module = GET_MODULE(win); | ||
ompi_osc_rdma_peer_t *peer; | ||
ompi_osc_rdma_sync_t *sync; | ||
|
||
OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "get: 0x%lx, %d, %s, %d, %d, %d, %s, %s", (unsigned long) origin_addr, | ||
OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "get: 0x%lx, %d, %s, %d, %d, %zu, %s, %s", (unsigned long) origin_addr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
origin_count
is now size_t
OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "get: 0x%lx, %d, %s, %d, %d, %zu, %s, %s", (unsigned long) origin_addr, | |
OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "get: 0x%lx, %zu, %s, %d, %d, %zu, %s, %s", (unsigned long) origin_addr, |
ompi/mca/osc/rdma/osc_rdma_comm.c
Outdated
@@ -867,7 +867,7 @@ int ompi_osc_rdma_rput (const void *origin_addr, int origin_count, ompi_datatype | |||
ompi_osc_rdma_sync_t *sync; | |||
int ret; | |||
|
|||
OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "rput: 0x%lx, %d, %s, %d, %d, %d, %s, %s", (unsigned long) origin_addr, origin_count, | |||
OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "rput: 0x%lx, %d, %s, %d, %d, %zu, %s, %s", (unsigned long) origin_addr, origin_count, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
origin_count is now size_t
OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "rput: 0x%lx, %d, %s, %d, %d, %zu, %s, %s", (unsigned long) origin_addr, origin_count, | |
OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "rput: 0x%lx, %zu, %s, %d, %d, %zu, %s, %s", (unsigned long) origin_addr, origin_count, |
ompi/mca/osc/rdma/osc_rdma_comm.c
Outdated
ompi_datatype_t *target_datatype, ompi_win_t *win) | ||
{ | ||
ompi_osc_rdma_module_t *module = GET_MODULE(win); | ||
ompi_osc_rdma_peer_t *peer; | ||
ompi_osc_rdma_sync_t *sync; | ||
|
||
OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "put: 0x%lx, %d, %s, %d, %d, %d, %s, %s", (unsigned long) origin_addr, | ||
OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "put: 0x%lx, %d, %s, %d, %d, %zu, %s, %s", (unsigned long) origin_addr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
origin_count is now size_t
OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "put: 0x%lx, %d, %s, %d, %d, %zu, %s, %s", (unsigned long) origin_addr, | |
OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "put: 0x%lx, %zu, %s, %d, %d, %zu, %s, %s", (unsigned long) origin_addr, |
@@ -3194,7 +3194,7 @@ ompi_osc_portals4_get_accumulate(const void *origin_addr, | |||
ptrdiff_t length, origin_lb, target_lb, result_lb, extent; | |||
|
|||
OPAL_OUTPUT_VERBOSE((50, ompi_osc_base_framework.framework_output, | |||
"get_accumulate: 0x%lx, %d, %s, 0x%lx, %d, %s, %d, %lu, %d, %s, %s, 0x%lx", | |||
"get_accumulate: 0x%lx, %zu, %s, 0x%lx, %d, %s, %d, %lu, %zu, %s, %s, 0x%lx", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"get_accumulate: 0x%lx, %zu, %s, 0x%lx, %d, %s, %d, %lu, %zu, %s, %s, 0x%lx", | |
"get_accumulate: 0x%lx, %zu, %s, 0x%lx, %zu, %s, %d, %lu, %zu, %s, %s, 0x%lx", |
fe3e9ed
to
4846192
Compare
Update the osc framework to use size_t for counts and ptrdiff_t for displacements. Signed-off-by: Jake Tronge <[email protected]>
4846192
to
8bcbe17
Compare
Thanks @devreal. I think all the format strings should be fixed. |
This updates the osc framework to use size_t for counts and ptrdiff_t for displacements.