Skip to content

Conversation

@DavidHuber-NOAA
Copy link
Contributor

@DavidHuber-NOAA DavidHuber-NOAA commented Dec 29, 2025

Description

This fixes a bug in CCPP that prevented the 06Z gfs_sfcanl job from running global_cycle on the 03Z IAU time. The code change now allows the input hour to be any integer between 0 and 23.
Resolves #4364
Refs #4408 (partially resolves but a full investigation is needed)

Type of change

  • Bug fix (fixes something broken)
  • New feature (adds functionality)
  • Maintenance (code refactor, clean-up, new CI test, etc.)

Change characteristics

How has this been tested?

  • C96_atm3DVar_extended test on WCOSS2
  • UFS_Utils regression tests on Ursa (no change to baseline)
  • UFS regression tests
  • Full suite of GW tests on all platforms (when the UFS model hash is ready)

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have documented my code, including function, input, and output descriptions
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • This change is covered by an existing CI test or a new one has been added

@DavidHuber-NOAA DavidHuber-NOAA added the GFS Change This PR, if merged, will change results for the GFS. label Dec 29, 2025
@DavidHuber-NOAA DavidHuber-NOAA changed the title Fix/gfs sfcanl Fix 06z gfs_sfcanl jobs failing due to a bug in CCPP Dec 29, 2025
Comment on lines 72 to 73
local MOM6_OUTPUT_DIR="${MOM6_OUTPUT_DIR:-./MOM6_OUTPUT}"
local MOM6_RESTART_DIR="${MOM6_RESTART_DIR:-./MOM6_RESTART}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
local MOM6_OUTPUT_DIR="${MOM6_OUTPUT_DIR:-./MOM6_OUTPUT}"
local MOM6_RESTART_DIR="${MOM6_RESTART_DIR:-./MOM6_RESTART}"
local MOM6_OUTPUT_DIR="./MOM6_OUTPUT"
local MOM6_RESTART_DIR="./MOM6_RESTART"

This isn't an option. The workflow will always write to this space. The other instances where this variable is set in this manner is ush/parsing_namelists_FV3.sh

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I don't think this should be configurable. This is assumed other places.

I also see that ufs-community/ufs-weather-model@6f22f57...f8b0802 we have a MOM6_OUTPUT_FH defined, which is defined in forecast_predet.

I'm assuming those are consistent definitions?

aerorahul
aerorahul previously approved these changes Jan 6, 2026
Copy link
Contributor

@aerorahul aerorahul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one suggestion, looks good.

Copy link
Contributor

@JessicaMeixner-NOAA JessicaMeixner-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should run a S2S test of some sort to make sure ocean output looks okay given the changes. I didn't follow exactly when those changes went in. @jiandewang or @dpsarmie might know more about that the MOM6_OUTPUT_FH to make sure that is as expected here now.

Comment on lines 72 to 73
local MOM6_OUTPUT_DIR="${MOM6_OUTPUT_DIR:-./MOM6_OUTPUT}"
local MOM6_RESTART_DIR="${MOM6_RESTART_DIR:-./MOM6_RESTART}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I don't think this should be configurable. This is assumed other places.

I also see that ufs-community/ufs-weather-model@6f22f57...f8b0802 we have a MOM6_OUTPUT_FH defined, which is defined in forecast_predet.

I'm assuming those are consistent definitions?

@dpsarmie
Copy link
Contributor

dpsarmie commented Jan 6, 2026

FHOUT_OCN_GFS / FHOUT_OCN is the variable that controls the MOM6 output frequency in GW and MOM6_OUTPUT_FH is just used to store an array of output times (in GW), correct? If so, then there might be an issue.

@JessicaMeixner-NOAA
Copy link
Contributor

FHOUT_OCN_GFS / FHOUT_OCN is the variable that controls the MOM6 output frequency in GW and MOM6_OUTPUT_FH is just used to store an array of output times (in GW), correct? If so, then there might be an issue.

Yes. And looking a bit further, this variable should be "FHOUT_OCN" and not MOM6_OUTPUT_FH.

Copy link
Contributor

@ClaraDraper-NOAA ClaraDraper-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested it, but can confirm that this PR brings the needed update to CCPP/physics into UFS_UTILS to fix the gfs_sfcanl issue. It also updates the CCPP/physics used by the model to the same hash.

Note that updating the ufs_model and ufs_utils brings in many additional changes in addition to the gfs_sfcnl issue.

@DavidHuber-NOAA
Copy link
Contributor Author

All tests passed on Ursa. I will run a develop C96C48mx500 case to verify the MOM6 output, then launch CI on all platforms.

@DavidHuber-NOAA
Copy link
Contributor Author

I verified the 6-hour MOM6 forecast for the C96C48mx025_S2SW_gfs_cyc case was identical to last night's nightly run of the same case and also verified that MOM6 output was written every 6 hours as expected:

>cmp /scratch3/NCEPDEV/stmp/David.Huber/rt_4389/COMROOT/C96C48mx500_S2SW_cyc_gfs_4389/gfs.20211220/18/model/ocean/history/gfs.t18z.6hr_avg.f120.nc /scratch3/NCEPDEV/global/role.glopara/GFS_CI_CD/URSA/BUILDS/GITLAB/nightly_6ada5183_010726/RUNTESTS/COMROOT/C96C48mx500_S2SW_cyc_gfs_6ada5183-6840/gfs.20211220/18/model/ocean/history/gfs.t18z.6hr_avg.f120.nc
> echo $?
0

Proceeding with CI testing.

@DavidHuber-NOAA
Copy link
Contributor Author

I am still wrong. The wavepostsbs metatask is already a dependency of the archiving jobs. Instead, we have a silent failure. This is in /scratch3/NCEPDEV/global/role.glopara/GFS_CI_CD/HERA/BUILDS/GITLAB/pr_cases_4389_f6105b9e_7163/RUNTESTS/COMROOT/C48mx500_3DVarAOWCDA_f6105b9e-7163/logs/2021032500/gfs_wavepostsbs_f015-f017.log during an MPMD section of the job at line 11403:

3: + wave_grib2_sbs.sh[88]/scratch3/NCEPDEV/global/role.glopara/GFS_CI_CD/HERA/BUILDS/GITLAB/pr_cases_4389_f6105b9e_7163/global-workflow/exec/gfs_ww3_grib.x
3: + wave_grib2_sbs.sh[89]export err=126
3: + wave_grib2_sbs.sh[89]err=126
3: + wave_grib2_sbs.sh[90][[ 126 -ne 0 ]]
3: + wave_grib2_sbs.sh[91]echo 'FATAL ERROR: gfs_ww3_grib.x returned non-zero status: 126; exiting!'
3: FATAL ERROR: gfs_ww3_grib.x returned non-zero status: 126; exiting!
3: + wave_grib2_sbs.sh[92]exit 126

Going to the end of the MPMD job at line 9891:

+ run_mpmd.sh[66]IFS=
+ run_mpmd.sh[66]read -r line
+ run_mpmd.sh[71]unset_strict
+ preamble.sh[45]set +eu
+ preamble.sh[46]set +o pipefail
+ run_mpmd.sh[73]srun -l --export=ALL --hint=nomultithread --multi-prog --output=mpmd.%j.%t.out -n 7 /scratch3/NCEPDEV/stmp/role.glopara/RUNDIRS/C48mx500_3DVarAOWCDA_f6105b9e-7163/gfs.2021032500/wavepostsbs_f017.82108/mpmd_cmdfile
+ run_mpmd.sh[74]err=0
+ run_mpmd.sh[75]set_strict
+ preamble.sh[35][[ YES == \Y\E\S ]]
+ preamble.sh[37]set -eu
+ preamble.sh[39]set -o pipefail
+ run_mpmd.sh[101][[ 0 -eq 0 ]]

It appears that run_mpmd.sh is not correctly catching failures of individual jobs.

@DavidHuber-NOAA DavidHuber-NOAA removed CI-Hera-Failed **Bot use only** CI testing on Hera for this PR has failed CI-Gaeac6-Failed **Bot use only** CI testing on Gaea C6 for this PR has failed CI-Ursa-Failed **Bot use only** CI testing on Ursa for this PR has failed labels Jan 16, 2026
@emcbot emcbot added CI-Gaeac6-Ready **CM use only** PR is ready for CI testing on Gaea C6 CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera CI-Ursa-Ready **CM use only** PR is ready for CI testing on Ursa CI-Ursa-Building **Bot use only** CI testing is cloning/building on Ursa CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera CI-Gaeac6-Building **Bot use only** CI testing is cloning/building on Gaea C6 CI-Ursa-Running **Bot use only** CI testing on Ursa for this PR is in-progress CI-Hera-Running **Bot use only** CI testing on Hera for this PR is in-progress CI-Gaeac6-Running **Bot use only** CI testing on Gaea C6 for this PR is in-progress CI-Ursa-Passed **Bot use only** CI testing on Ursa for this PR has completed successfully CI-Hera-Passed **Bot use only** CI testing on Hera for this PR has completed successfully CI-Gaeac6-Passed **Bot use only** CI testing on Gaea C6 for this PR has completed successfully and removed CI-Ursa-Ready **CM use only** PR is ready for CI testing on Ursa CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera CI-Gaeac6-Ready **CM use only** PR is ready for CI testing on Gaea C6 CI-Ursa-Building **Bot use only** CI testing is cloning/building on Ursa CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera CI-Gaeac6-Building **Bot use only** CI testing is cloning/building on Gaea C6 CI-Ursa-Running **Bot use only** CI testing on Ursa for this PR is in-progress CI-Hera-Running **Bot use only** CI testing on Hera for this PR is in-progress CI-Gaeac6-Running **Bot use only** CI testing on Gaea C6 for this PR is in-progress labels Jan 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI-Gaeac6-Passed **Bot use only** CI testing on Gaea C6 for this PR has completed successfully CI-Hera-Passed **Bot use only** CI testing on Hera for this PR has completed successfully CI-Hercules-Passed **Bot use only** CI testing on Hercules for this PR has completed successfully CI-Orion-Passed **Bot use only** CI testing on Orion for this PR has completed successfully CI-Ursa-Passed **Bot use only** CI testing on Ursa for this PR has completed successfully CI-Wcoss2-Running CI testing on WCOSS for this PR is in-progress GFS Change This PR, if merged, will change results for the GFS.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

global_cycle reports a fatal error on 06Z gfs cycles

7 participants