Skip to content

Conversation

@climbfuji
Copy link
Collaborator

@climbfuji climbfuji commented Jan 8, 2026

DRAFT PR FOR DISCUSSION

Description of Changes:

This PR modifies how data is read in the CCPP init and timestep_init phases. Instead of reading the data serially with every single MPI task, the data is read by the MPI root rank and then broadcasted. This is implemented for all code except the GOCART aerosols (NEPTUNE doesn't use these, hence we have no way to test; also to check: new o3 and h2o code).

The implementation is taking the path described in #1106: an MPI broadcast wrapper is added in a new module mpiutils which wraps around the - now type dependent - MPI interfaces in mpi_f08.

The CCPP MPI broadcast routines in this PR make use of a ccpp_abort function to stop the model in the event of an MPI error. This is not following CCPP requirements to avoid having to pass errmsg and errflg all the way down and then back out to the host model to abort. CCPP compliancy with current rules can be implemented, but it is worth discussing if alternative methods are preferable and/or simplify the code. To note: The authoritative code in NCAR ccpp-physics in many places simplies calls stop to abort the model. That's much worse than using MPI_ABORT and of course also not CCPP compliant. In NEPTUNE, we've used a function equivalent to ccpp_abort in these places.

Sample output from a crash in ccpp_abort with GNU:

ccpp_abort: mpiutil.F90:bcast_ld0
#0  0x908cd223880 in ???
#1  0x254d6e5 in ccpp_abort
        at /home/dom/work/neptune-atmos/nep-check-ci-20260107/external/ccpp/ccpp-physics/physics/tools/mpiutil.F90:240
#2  0x254d77f in __mpiutil_MOD_bcast_ld0
        at /home/dom/work/neptune-atmos/nep-check-ci-20260107/external/ccpp/ccpp-physics/physics/tools/mpiutil.F90:219
#3  0x23e678b in __module_radiation_astronomy_MOD_sol_init
        at /home/dom/work/neptune-atmos/nep-check-ci-20260107/external/ccpp/ccpp-physics/physics/Radiation/radiation_astronomy.f:248
#4  0x208c1f7 in __gfs_rrtmg_setup_MOD_gfs_rrtmg_setup_init
        at /home/dom/work/neptune-atmos/nep-check-ci-20260107/external/ccpp/ccpp-physics/physics/Interstitials/UFS_SCM_NEPTUNE/GFS_rrtmg_setup.F90:217
#5  0x1e61f36 in __ccpp_nep_myn_tie_tho_n7_time_vary_cap_MOD_nep_myn_tie_tho_n7_time_vary_init_cap
        at /home/dom/work/neptune-atmos/nep-check-ci-20260107/ci/physics/suite-n7-fd-xs/workdir/build_gnu_drivers-esmx-forecast_debug-on/external/ccpp/ccpp-physics/ccpp_nep_myn_tie_tho_n7_time_vary_cap.F90:1664
#6  0x17a56d1 in __ccpp_nep_myn_tie_tho_n7_cap_MOD_nep_myn_tie_tho_n7_init_cap
        at /home/dom/work/neptune-atmos/nep-check-ci-20260107/ci/physics/suite-n7-fd-xs/workdir/build_gnu_drivers-esmx-forecast_debug-on/external/ccpp/ccpp-physics/ccpp_nep_myn_tie_tho_n7_cap.F90:181
#7  0x155ea34 in __ccpp_static_api_MOD_ccpp_physics_init
        at /home/dom/work/neptune-atmos/nep-check-ci-20260107/ci/physics/suite-n7-fd-xs/workdir/build_gnu_drivers-esmx-forecast_debug-on/external/ccpp/ccpp-physics/ccpp_static_api.F90:268
#8  ...

Tests Conducted:

This code is coming from the NRL fork of ccpp-physics. It has been tested extensively and is used in the latest code delivered for operational implementation. Reading with the MPI root rank and broadcasting solved the performance issues we have seen on large task counts that are described briefly in #1106.

We'll need to test these changes in the SCM and the UFS; in particular for the latter, we want to look at b4b reproducibility (results were zero-diff when we introduced this in NEPTUNE) and at performance implications for production-size runs.

Dependencies:

None

Documentation:

I suggest we discuss the nitty gritty details of the implementation (how to stop the model if an error occurs in the broadcast routines) before we update the documentation in ccpp-doc.

Issue (optional):

Closes #1106

Contributors (optional):

@matusmartini (NRL)

…_ozone_forcing_data

2. Add NEPTUNE interstitials in physics/Interstitials/UFS_SCM_NEPTUNE/
…th mpiroot; move mpiutil.F90 to subdirectory tools
@climbfuji climbfuji self-assigned this Jan 8, 2026
integer, intent(out) :: ierr
call MPI_BCAST(arr, 1, MPI_INTEGER, root, comm, ierr)
if (ierr/=MPI_SUCCESS) then
call ccpp_external_abort("mpiutil.F90:bcast_i32d0")
Copy link
Collaborator Author

@climbfuji climbfuji Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be discussed: replace all these with returns and then return immediately in CCPP scheme?

@climbfuji climbfuji changed the title Read and broadcast data from MPI root rank during init and timestep init phase in GFS time vary Read and broadcast data from MPI root rank during init and timestep init phase in GFS time vary; add time vary interstitials for NEPTUNE Jan 8, 2026
…*.F90 when writing to errmsg for invalid w3kindreal/w3kindint; additionally: formatting updates
…th mpiroot; move mpiutil.F90 to subdirectory tools
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CCPP MPI interface

1 participant