Skip to content

Conversation

@NeilBarton-NOAA
Copy link
Contributor

Description

SFS's use of the GLORe ICs are routinely broken when merging develop into dev/sfs. To minimize manual merge issues, this PR introduces the changes needed to run with GLORe ICs and a CI.

ICs for the CI are currently at
ursa:/scratch4/NCEPDEV/stmp/Neil.Barton/ICs/CPC/C96mx025

update from #4309 by moving MOM6_INTERP_ICS to config.base

Type of change

  • Bug fix (fixes something broken)
  • New feature (adds functionality)
  • Maintenance (code refactor, clean-up, new CI test, etc.)

Change characteristics

  • Is this change expected to change outputs (e.g. value changes to existing outputs, new files stored in COM, files removed from COM, filename changes, additions/subtractions to archives)? YES/NO (If YES, please indicate to which system(s))
    • GFS
    • GEFS
    • SFS
    • GCAFS
  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO
  • Does this change require an update to any of the following submodules? NO (If YES, please add a link to any PRs that are pending.)
    • EMC verif-global
    • GDAS
    • GFS-utils
    • GSI
    • GSI-monitor
    • GSI-utils
    • UFS-utils
    • UFS-weather-model
    • wxflow

How has this been tested?

Forecast-only CIs and C96_gcafs_cycled_noDA on ursa. Note, prep_emissions fails for me due to file permission issues, but stage_ic is successful

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have documented my code, including function, input, and output descriptions
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • This change is covered by an existing CI test or a new one has been added
  • Any new scripts have been added to the .github/CODEOWNERS file with owners
  • I have made corresponding changes to the system documentation if necessary

@NeilBarton-NOAA NeilBarton-NOAA changed the title Sfs cpc i cs SFS GLORe ICs Dec 18, 2025
@TerrenceMcGuinness-NOAA
Copy link
Collaborator

This all looks great. Because this PR makes updates to the pipeline scripts it will have to be tested outside the automated system. Let me test this PR on the CI pipeline manually this afternoon.

@emcbot emcbot added CI-Hercules-Building **Bot use only** CI testing is cloning/building on Hercules CI-Hercules-Running **Bot use only** CI testing on Hercules for this PR is in-progress and removed CI-Hercules-Building **Bot use only** CI testing is cloning/building on Hercules labels Dec 23, 2025
@emcbot
Copy link

emcbot commented Dec 23, 2025

C96mx025_S2S FAILED on Hercules (pipeline ID: 6606)

In directory: /work2/noaa/global/role-global/GFS_CI_CD/HERCULES/BUILDS/GITLAB/pr_cases_4359_f4f1d1e0_6606/RUNTESTS/EXPDIR/C96mx025_S2S_f4f1d1e0-6606

Error Log Files:


/work2/noaa/global/role-global/GFS_CI_CD/HERCULES/BUILDS/GITLAB/pr_cases_4359_f4f1d1e0_6606/RUNTESTS/COMROOT/C96mx025_S2S_f4f1d1e0-6606/logs/1994050100/sfs_stage_ic.log

View Error Logs: (sfs_stage_ic.log)

This failure was detected automatically by global-workflow's CI/CD Pipeline

@emcbot emcbot added CI-Hercules-Failed **Bot use only** CI testing on Hercules for this PR has failed and removed CI-Hercules-Running **Bot use only** CI testing on Hercules for this PR is in-progress labels Dec 23, 2025
@emcbot
Copy link

emcbot commented Dec 23, 2025

C96mx100_S2S FAILED on Hercules (pipeline ID: 6606)

In directory: /work2/noaa/global/role-global/GFS_CI_CD/HERCULES/BUILDS/GITLAB/pr_cases_4359_f4f1d1e0_6606/RUNTESTS/EXPDIR/C96mx100_S2S_f4f1d1e0-6606

Error Log Files:


/work2/noaa/global/role-global/GFS_CI_CD/HERCULES/BUILDS/GITLAB/pr_cases_4359_f4f1d1e0_6606/RUNTESTS/COMROOT/C96mx100_S2S_f4f1d1e0-6606/logs/1994050100/sfs_atmos_prod_mem000_f000.log
/work2/noaa/global/role-global/GFS_CI_CD/HERCULES/BUILDS/GITLAB/pr_cases_4359_f4f1d1e0_6606/RUNTESTS/COMROOT/C96mx100_S2S_f4f1d1e0-6606/logs/1994050100/sfs_atmos_prod_mem001_f000.log
/work2/noaa/global/role-global/GFS_CI_CD/HERCULES/BUILDS/GITLAB/pr_cases_4359_f4f1d1e0_6606/RUNTESTS/COMROOT/C96mx100_S2S_f4f1d1e0-6606/logs/1994050100/sfs_atmos_prod_mem002_f000.log

View Error Logs: (sfs_atmos_prod_mem000_f000.log) (sfs_atmos_prod_mem001_f000.log) (sfs_atmos_prod_mem002_f000.log)

This failure was detected automatically by global-workflow's CI/CD Pipeline

@TerrenceMcGuinness-NOAA
Copy link
Collaborator

TerrenceMcGuinness-NOAA commented Dec 23, 2025

It appears wxflow file_utils does not allow for copy linked files:

mterry (hercules-login-3) bin $ grep cice_model.res.nc /work2/noaa/global/role-global/GFS_CI_CD/HERCULES/BUILDS/GITLAB/pr_cases_4359_f4f1d1e0_6606/RUNTESTS/COMROOT/C96mx025_S2S_f4f1d1e0-6606/logs/1994050100/sfs_stage_ic.log | grep ERROR
2025-12-23 10:22:09,392 - ERROR    - file_utils  : Target file '/work/noaa/global/glopara/data/ICSDIR/C96mx025/20251217/sfs.19940430/18/mem001/model/ice/restart/19940501.000000.cice_model.res.nc' does not exist and is required, ABORT!

mterry (hercules-login-3) bin $ ls /work/noaa/global/glopara/data/ICSDIR/C96mx025/20251217/sfs.19940430/18/mem001/model/ice/restart/19940501.000000.cice_model.res.nc -l
lrwxrwxrwx 1 role-global stmp 126 Dec  9 11:06 /work/noaa/global/glopara/data/ICSDIR/C96mx025/20251217/sfs.19940430/18/mem001/model/ice/restart/19940501.000000.cice_model.res.nc -> /scratch4/NCEPDEV/stmp/Neil.Barton/ICs/CPC/C96mx025/sfs.19940430/18/mem000/model/ice/restart/19940501.000000.cice_model.res.nc

@NOAA-EMC NOAA-EMC deleted a comment from emcbot Dec 23, 2025
@NOAA-EMC NOAA-EMC deleted a comment from emcbot Dec 23, 2025
@NOAA-EMC NOAA-EMC deleted a comment from emcbot Dec 23, 2025
@NOAA-EMC NOAA-EMC deleted a comment from emcbot Dec 23, 2025
@NeilBarton-NOAA
Copy link
Contributor Author

@TerrenceMcGuinness-NOAA The glopara softlink for the ICs for the non-control members are incorrect. The soft links are pointing to my directory while they should be pointing to the mem000 directory

@DavidHuber-NOAA
Copy link
Contributor

@NeilBarton-NOAA can you stage ICs with relative links on Ursa? Once complete, I will replace the existing ICs with the corrected ones.

Going further, all that I do when adding ICs (or fix data, etc) to the glopara space is perform an rsync -avh <source(s)> <glopara_target_dir>/. All links in the source should be relative so that they will work in the glopara target destination.

@NeilBarton-NOAA
Copy link
Contributor Author

@DavidHuber-NOAA I fixed the softlinks at
ursa:/scratch4/NCEPDEV/stmp/Neil.Barton/ICs/CPC/C96mx025

@DavidHuber-NOAA
Copy link
Contributor

@NeilBarton-NOAA data has been staged on all platforms. Re-launching CI on Hercules.

@DavidHuber-NOAA DavidHuber-NOAA removed the CI-Hercules-Failed **Bot use only** CI testing on Hercules for this PR has failed label Jan 6, 2026
@emcbot emcbot added CI-Hercules-Ready **CM use only** PR is ready for CI testing on Hercules CI-Hercules-Building **Bot use only** CI testing is cloning/building on Hercules CI-Hercules-Running **Bot use only** CI testing on Hercules for this PR is in-progress and removed CI-Hercules-Ready **CM use only** PR is ready for CI testing on Hercules CI-Hercules-Building **Bot use only** CI testing is cloning/building on Hercules labels Jan 6, 2026
@emcbot
Copy link

emcbot commented Jan 6, 2026

C96C48_hybatmDA FAILED on Hercules (pipeline ID: 6810)

In directory: /work2/noaa/global/role-global/GFS_CI_CD/HERCULES/BUILDS/GITLAB/pr_cases_4359_bc6c2927_6810/RUNTESTS/EXPDIR/C96C48_hybatmDA_bc6c2927-6810

Error Log Files:


/work2/noaa/global/role-global/GFS_CI_CD/HERCULES/BUILDS/GITLAB/pr_cases_4359_bc6c2927_6810/RUNTESTS/COMROOT/C96C48_hybatmDA_bc6c2927-6810/logs/2021122106/enkfgdas_fcst_mem001.log
/work2/noaa/global/role-global/GFS_CI_CD/HERCULES/BUILDS/GITLAB/pr_cases_4359_bc6c2927_6810/RUNTESTS/COMROOT/C96C48_hybatmDA_bc6c2927-6810/logs/2021122106/enkfgdas_fcst_mem002.log

View Error Logs: (enkfgdas_fcst_mem001.log) (enkfgdas_fcst_mem002.log)

This failure was detected automatically by global-workflow's CI/CD Pipeline

@emcbot emcbot added CI-Hercules-Failed **Bot use only** CI testing on Hercules for this PR has failed and removed CI-Hercules-Running **Bot use only** CI testing on Hercules for this PR is in-progress labels Jan 6, 2026
@NeilBarton-NOAA
Copy link
Contributor Author

in /work2/noaa/global/role-global/GFS_CI_CD/HERCULES/BUILDS/GITLAB/pr_cases_4359_bc6c2927_6810/RUNTESTS/COMROOT/C96C48_hybatmDA_bc6c2927-6810/logs/2021122106/enkfgdas_fcst_mem001.log.0

-> FATAL from PE 1: NaN in input field of mpp_reproducing_sum(_2d), this indicates numerical instability

@DavidHuber-NOAA does this error occur in other CI tests?

@DavidHuber-NOAA
Copy link
Contributor

DavidHuber-NOAA commented Jan 7, 2026

FYI @RussTreadon-NOAA @CoryMartin-NOAA
@NeilBarton-NOAA No, we haven't seen this in other PRs. The nightly suite of tests ran fine on Hercules yesterday. However, we have periodically seen erroneous NaNs from the EnKF on Hercules, but this would usually result in an error from the forecast when applying the increment (see #4348). The 'fix' for that issue was to increase memory requests on Hercules for the enkfgdas_eupd job, but I think this may suggest that we need to rethink the issue. I will reopen #4348.

Also, I see that the C96mx025_S2S case did not run on Hercules. I forgot that GitLab requires the gitlab-ci-hosts.yml file to be in develop branch to run any given test. I will run this test manually.

@DavidHuber-NOAA
Copy link
Contributor

@NeilBarton-NOAA @RussTreadon-NOAA @CoryMartin-NOAA I see now that I misread the log message. This message is identical to that reported in #4348, thus it is most likely the same issue. Apologies for any confusion.

@DavidHuber-NOAA DavidHuber-NOAA added CI-Hercules-Running (CM) CI testing is being run locally on Hercules. and removed CI-Hercules-Failed **Bot use only** CI testing on Hercules for this PR has failed labels Jan 7, 2026
@RussTreadon-NOAA
Copy link
Contributor

A check of /work2/noaa/global/role-global/GFS_CI_CD/HERCULES/BUILDS/GITLAB/pr_cases_4359_bc6c2927_6810/RUNTESTS/COMROOT/C96C48_hybatmDA_bc6c2927-6810/enkfgdas.20211221/06/ensstat/analysis/atmos/enkfgdas_eupd.log shows unphysical EnKF analysis increments

 0:  time level            1
 0:  --------------
 0: ens. mean anal. increment min/max  u   -11608.5214844        13056.9550781
 0: ens. mean anal. increment min/max  v   -9908.88671875        8727.07031250
 0: ens. mean anal. increment min/max  tv   -5696.84033203        7490.68652344
 0: ens. mean anal. increment min/max  q   -1.85283851624       0.857560813427
 0: ens. mean anal. increment min/max  oz  -0.113994069397E-02   0.825737719424E-03
 0: ens. mean anal. increment min/max  ps   -1091.93530273        642.401550293
 0: ens. mean anal. increment min/max  st1   -594.956726074        821.693603516
 0: ens. mean anal. increment min/max  st2   -196.260498047        295.271087646
 0: ens. mean anal. increment min/max  st3   -123.664749146        154.170471191
 0: ens. mean anal. increment min/max  sl1   -10.7356119156        3.89627695084
 0: ens. mean anal. increment min/max  sl2   -18.3905010223        12.2464389801
 0: ens. mean anal. increment min/max  sl3   -10.8950519562        5.28550529480
 0:  time level            2
 0:  --------------
 0: ens. mean anal. increment min/max  u   -12994.9707031        10239.5566406
 0: ens. mean anal. increment min/max  v   -10368.7382812        10712.7773438
 0: ens. mean anal. increment min/max  tv   -7201.08398438        5558.74511719
 0: ens. mean anal. increment min/max  q   -1.86006522179       0.999930679798
 0: ens. mean anal. increment min/max  oz  -0.124473567121E-02   0.877812970430E-03
 0: ens. mean anal. increment min/max  ps   -1012.11206055        405.562408447
 0: ens. mean anal. increment min/max  st1   -475.964324951        1150.36181641
 0: ens. mean anal. increment min/max  st2   -202.577484131        309.345214844
 0: ens. mean anal. increment min/max  st3   -119.290939331        154.937789917
 0: ens. mean anal. increment min/max  sl1   -12.0871276855        3.90066051483
 0: ens. mean anal. increment min/max  sl2   -18.1210784912        12.2423419952
 0: ens. mean anal. increment min/max  sl3   -10.9974822998        5.28750991821
 0:  time level            3
 0:  --------------
 0: ens. mean anal. increment min/max  u   -20227.9160156        9657.53906250
 0: ens. mean anal. increment min/max  v   -8430.89843750        19950.1816406
 0: ens. mean anal. increment min/max  tv   -4890.57812500        6390.98828125
 0: ens. mean anal. increment min/max  q   -1.72090220451        1.08359122276
 0: ens. mean anal. increment min/max  oz  -0.107280258089E-02   0.898180995136E-03
 0: ens. mean anal. increment min/max  ps   -1118.31860352        630.666748047
 0: ens. mean anal. increment min/max  st1   -749.712097168        2925.41650391
 0: ens. mean anal. increment min/max  st2   -196.014633179        313.488891602
 0: ens. mean anal. increment min/max  st3   -114.674911499        156.164077759
 0: ens. mean anal. increment min/max  sl1   -12.9224348068        3.90812015533
 0: ens. mean anal. increment min/max  sl2   -17.8044586182        12.2385778427
 0: ens. mean anal. increment min/max  sl3   -11.1035232544        5.28987169266

@SamuelDegelia-NOAA found similarly spurious EnKF analysis increments when testing rrfs/v1.0.0 on WCOSS2. See GSI PR #961 for details.

@DavidHuber-NOAA
Copy link
Contributor

The C96mx025_S2S test passed. Merging.

@DavidHuber-NOAA DavidHuber-NOAA added CI-Hercules-Passed (cm) Manual CI passed on Hercules and removed CI-Hercules-Running (CM) CI testing is being run locally on Hercules. labels Jan 7, 2026
@DavidHuber-NOAA DavidHuber-NOAA merged commit 8228066 into NOAA-EMC:develop Jan 7, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI-Hercules-Passed (cm) Manual CI passed on Hercules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants