-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workaround for independent writes to Iterations in parallel, better detection of BP5 which in turn uncovers more instances of the first issue #1619
Conversation
237b394
to
8bbc170
Compare
@@ -946,10 +946,16 @@ void hipace_like_write(std::string const &file_ending) | |||
int const last_step = 100; | |||
int const my_first_step = i_mpi_rank * int(local_Nz); | |||
int const all_last_step = last_step + (i_mpi_size - 1) * int(local_Nz); | |||
|
|||
bool participate_in_barrier = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ax3l Can you please check if this bug also affects Hipace? Currently, the sequence of barriers and flushes dont match from rank to rank. This was uncovered only now, since flushing is effectively not collective in many situations, but this test now uses adios2::Engine::PerformDataWrite()
of BP5 which is a bit stricter there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, great catch!
For HiPACE++, we changed the time stepping logic the other year, so that every MPI rank just writes to exactly one iteration. Thus, it cannot have this bug (anymore).
9dd2a14
to
72a465c
Compare
Somehow PerformDataWrite() leads to trouble with this pattern.
This reverts commit 36597bd. No longer needed after rebasing on fix-iteration-flush
It used Series::flush non-collectively
ed25000
to
0a05f10
Compare
Follow-up to openPMD#1619
This somewhat fixes #1616 until we add a better solution. With this PR:
seriesFlush()
will always flush the containing Iteration if called from within an Iteration (and will ignore missingdirty
annotations).At the same time, I added a better detection for BP5-specific features. Since this means that
adios2::Engine::PerformDataWrite()
is used automatically more often, this uncovers further parallel flushing bugs. So, these two items are treated together in this PR.In a follow-up PR later on, as a more breaking change, we would also flush all open iterations in MPI-parallel contexts on
series.flush()
, but for this we will first need functionality to reopen iterations after close #1592.TODO: