Read and broadcast data from MPI root rank during init and timestep init phase in GFS time vary; add time vary interstitials for NEPTUNE #1187
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
DRAFT PR FOR DISCUSSION
Description of Changes:
This PR modifies how data is read in the CCPP
initandtimestep_initphases. Instead of reading the data serially with every single MPI task, the data is read by the MPI root rank and then broadcasted. This is implemented for all code except the GOCART aerosols (NEPTUNE doesn't use these, hence we have no way to test; also to check: new o3 and h2o code).The implementation is taking the path described in #1106: an MPI broadcast wrapper is added in a new module
mpiutilswhich wraps around the - now type dependent - MPI interfaces inmpi_f08.The CCPP MPI broadcast routines in this PR make use of a
ccpp_abortfunction to stop the model in the event of an MPI error. This is not following CCPP requirements to avoid having to passerrmsganderrflgall the way down and then back out to the host model to abort. CCPP compliancy with current rules can be implemented, but it is worth discussing if alternative methods are preferable and/or simplify the code. To note: The authoritative code in NCAR ccpp-physics in many places simplies callsstopto abort the model. That's much worse than usingMPI_ABORTand of course also not CCPP compliant. In NEPTUNE, we've used a function equivalent toccpp_abortin these places.Sample output from a crash in
ccpp_abortwith GNU:Tests Conducted:
This code is coming from the NRL fork of ccpp-physics. It has been tested extensively and is used in the latest code delivered for operational implementation. Reading with the MPI root rank and broadcasting solved the performance issues we have seen on large task counts that are described briefly in #1106.
We'll need to test these changes in the SCM and the UFS; in particular for the latter, we want to look at b4b reproducibility (results were zero-diff when we introduced this in NEPTUNE) and at performance implications for production-size runs.
Dependencies:
None
Documentation:
I suggest we discuss the nitty gritty details of the implementation (how to stop the model if an error occurs in the broadcast routines) before we update the documentation in ccpp-doc.
Issue (optional):
Closes #1106
Contributors (optional):
@matusmartini (NRL)