Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: Flush IOHandler only once, not for each Iteration #1642

Merged
merged 8 commits into from
Jul 11, 2024

Conversation

franzpoeschel
Copy link
Contributor

@franzpoeschel franzpoeschel commented Jun 27, 2024

This might address a performance regression seen by @guj.

Until now, Series::flushFileBased() and Series::flushGorVBased() flushed the IOHandler for every iteration. Since flushing has a constant overhead, calling Series::flush() has a linear complexity along with the number of Iterations, even if only a single Iteration has modifications.

Fixing this was not entirely trivial, since in file-based encoding, some frontend object must unset the written flag, since they must be written anew for each file. Before this PR, this could easily be done synchronously in the frontend, but since all Iterations are now flushed at once, this must now be done asynchronously in the backend as a backend task.

Also speed up the common case of parsing flush options: No options at all.

  • Check if this addresses the performance regressions seen by @guj
  • Check this against PIConGPU, the new written logic might behave weird in more complex setups

@franzpoeschel
Copy link
Contributor Author

I just tested this against @guj's data on Perlmutter and it clearly improves the performance.

@guj
Copy link
Contributor

guj commented Jun 29, 2024

I just tested this against @guj's data on Perlmutter and it clearly improves the performance.

Thanks Franz. I just checked out this branch, and on Mac it did not show any difference however I do see on Perlmutter it saved about 10% (20 seconds) . On my Mac times observed are pretty much the same as before.

@franzpoeschel
Copy link
Contributor Author

No difference at all? I observed a clear system-independent performance regression for issue380_f /pscratch/sd/j/junmin/perlmutter/Jan2024/Test/issue380/bp5-f/defaultBP5-rank-ews_diags_f/diag1/openpmd_ bp f: The IOHandler is flushed for each Iteration even if no IOtasks are enqueued. The PR fuses all flushes into a single one, thus getting rid of this overhead.

You can further improve performance by using deferred iteration parsing:

Series series = Series(fileName, readMode, "defer_iteration_parsing = true");
// ...
    auto ex = series.iterations[i].open().meshes["E"]["x"];

And even further by closing the Iterations after using them:

   series.iterations[i].close();

Even without these improvements, I see a performance improvement far above just 20 seconds for going through 4000 Iterations.

@guj
Copy link
Contributor

guj commented Jul 8, 2024

Ah, I was running issue380.cpp

Yes, issue380_f has significant improvement after your fix!!

No difference at all? I observed a clear system-independent performance regression for issue380_f /pscratch/sd/j/junmin/perlmutter/Jan2024/Test/issue380/bp5-f/defaultBP5-rank-ews_diags_f/diag1/openpmd_ bp f: The IOHandler is flushed for each Iteration even if no IOtasks are enqueued. The PR fuses all flushes into a single one, thus getting rid of this overhead.

You can further improve performance by using deferred iteration parsing:

Series series = Series(fileName, readMode, "defer_iteration_parsing = true");
// ...
    auto ex = series.iterations[i].open().meshes["E"]["x"];

And even further by closing the Iterations after using them:

   series.iterations[i].close();

Even without these improvements, I see a performance improvement far above just 20 seconds for going through 4000 Iterations.

@franzpoeschel
Copy link
Contributor Author

Great to hear!

Ah, I was running issue380.cpp

Does that one have noticeable performance issues that we need to look into?

@guj
Copy link
Contributor

guj commented Jul 8, 2024

Great to hear!

Ah, I was running issue380.cpp

Does that one have noticeable performance issues that we need to look into?
This fix basically made the time gap between issue380.cpp and issue380_f.cpp disappear. So would be great to merge it.

@@ -2071,6 +2071,7 @@ inline void fileBased_write_test(const std::string &backend)
.makeConstant<double>(1.0);

o.iterations[overlong_it].setTime(static_cast<double>(overlong_it));
o.flush();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This change makes the test stricter, since Series::flush() unlike the destructor will not swallow exceptions.

@franzpoeschel franzpoeschel enabled auto-merge (squash) July 11, 2024 10:17
@franzpoeschel franzpoeschel merged commit bda3544 into openPMD:dev Jul 11, 2024
30 of 31 checks passed
@ax3l ax3l requested a review from guj July 16, 2024 17:08
@ax3l ax3l added this to the 0.16.0 milestone Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants