Skip to content

Conversation

@JiliDong-NOAA
Copy link
Contributor

@JiliDong-NOAA JiliDong-NOAA commented Oct 9, 2025

Commit Queue Requirements:

  • This PR addresses a relevant WM issue (if not, create an issue).
  • All subcomponent pull requests (if any) have been reviewed by their code managers.
  • Run the full Intel+GNU RT suite (compared to current baselines), preferably on Ursa (Derecho or Hercules are acceptable alternatives). Exceptions: documentation-only PRs, CI-only PRs, etc.
    • Commit log file w/full results from RT suite run (if applicable).
    • Verify that test_changes.list indicates which tests, if any, are changed by this PR. Commit test_changes.list, even if it is empty.
  • Fill out all sections of this template.

Description:

This PR is from @DusanJovic-NOAA and @JiliDong-NOAA and it fixes RRFS/REFS restart bitwise reproducibility issues caused by:

  1. RRFS smoke/dust components
  2. HAILCAST variables updraft duration and mask not written out and read in the restart runs
  3. snow equivalent water accumulation not written out to the restart file
  4. saSAS convection initialization logic (i.e. qadv) needs to be corrected
  5. Grell-Freitas convection initialization logic needs to be corrected (i.e. cold starting T/q tendency only applied in the first timestep)
  6. REFS ensemble restart reproducibility issues when running with 32 bit physics (SPP related variable names mismatch and data type precision inconsistency)

It also fixes crash when running REFS under DEBUG mode
The issues are related to LSM-SPP. It appears that LSM-SPP perturbations were added to the whole domain without masking out the water/ice points. This caused:

  1. 0 index error under DEBUG mode for smcmin/smcmax(stype) when stype=0 over water
  2. floating overflow error under DEBUG mode when applying LSM-SPP to zorll where zorll would have missing values over water/ice (9x10e30)
    The forecast will only change when Grell-Freitas is turned on during warm start runs with gf_coldstart being explicitly set to T in the namelist

This PR also includes a hook to output surface specific humidity, which may be needed for RRFS post-processing.

The PR address issue #2926

Commit Message:

* UFSWM - [production/RRFS.v1] fix RRFS/REFS restart reproducibility and DEBUG crash issues
  * AQM - 
  * CDEPS - 
  * CICE - 
  * CMEPS - 
  * CMakeModules - 
  * UFSATM -  [production/RRFS.v1] fix RRFS/REFS restart reproducibility
    * ccpp-physics -  [production/RRFS.v1] fix RRFS/REFS restart reproducibility
    * atmos_cubed_sphere -  [production/RRFS.v1] fix HAILCAST restart reproducibility
  * GOCART - 
  * HYCOM - 
  * MOM6 - 
  * NOAHMP - 
  * WW3 - 
  * fire_behavior
  * stochastic_physics -  [production/RRFS.v1] fix RRFS/REFS restart reproducibility and DEBUG crash

Priority:

  • Critical Bugfix: Reason - This PR is for RRFS v1 implementation. The code delivery data is set to be Oct. 31
  • High: Reason
  • Normal

Git Tracking

UFSWM:

  • Closes #

Sub component Pull Requests:

UFSWM Blocking Dependencies:

  • Blocked by #
  • None

Documentation:

  • Documentation update required.
    • Relevant updates are included with this PR.
    • A WM issue has been opened to track the need for a documentation update; a person responsible for submitting the update has been assigned to the issue (link issue).
  • Documentation update NOT required.
    • Explanation:

Changes

Regression Test Changes (Please commit test_changes.list):

  • PR Adds New Tests/Baselines.
  • PR Updates/Changes Baselines.
  • No Baseline Changes.

Input data Changes:

  • None.
  • New input data.
  • Updated input data.

Library Changes/Upgrades:

  • Required
    • Library names w/versions:
    • Git Stack Issue (JCSDA/spack-stack#)
  • No Updates

Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Hercules
    • GaeaC6
    • Derecho
    • Ursa
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
  • opnReqTest (complete task if unnecessary)

@github-project-automation github-project-automation bot moved this to Evaluating in PRs to Process Oct 9, 2025
@JiliDong-NOAA JiliDong-NOAA changed the title [production/RRFS.v1] fix RRFS/REFS restart reproducibility and DEBUG crash issues [production/RRFS.v1] fix RRFS/REFS restart reproducibility and DEBUG crash issues for RRFSv1 operational implementation Oct 9, 2025
@gspetro-NOAA
Copy link
Collaborator

@jkbk2004 @BrianCurtis-NOAA Do either of you need more information before sanity testing this production branch PR?

@gspetro-NOAA gspetro-NOAA added the No Baseline Change No Baseline Change label Oct 14, 2025
@gspetro-NOAA gspetro-NOAA moved this from Evaluating to Review in PRs to Process Oct 15, 2025
@BrianCurtis-NOAA
Copy link
Collaborator

@MatthewPyle-NOAA Is there any specific testing this branch uses? I can't recall if you run the full suite on WCOSS2 and/or any other system, or rely on another testing system.

@MatthewPyle-NOAA
Copy link
Collaborator

@BrianCurtis-NOAA We typically have run the rt.conf_rrfs tests for this branch.

@BrianCurtis-NOAA
Copy link
Collaborator

@BrianCurtis-NOAA We typically have run the rt.conf_rrfs tests for this branch.

I don't see any evidence that these were run. @JiliDong-NOAA were tests run on any machine, yet? Is there a machine not named WCOSS2 that you prefer rt.conf_rrfs is run on?

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Oct 15, 2025

@MatthewPyle-NOAA @BrianCurtis-NOAA This production branch is really getting diverged from develop branch: spack stack and new machine, etc. Note that this branch is based on spack stack 1.5 and there is decommission plan for hera: around Feb or Spring time. We are already using Ursa. Quite some work to sync between develop and this branch. If possible, optimal option might be recreating a production branch.

@gspetro-NOAA gspetro-NOAA added No Baseline Change No Baseline Change UFSATM There are changes to the UFSATM repository. CCPP There are changes to a CCPP repository. A3S There are changes to the atmos_cubed_sphere component repository. labels Oct 31, 2025
@jkbk2004
Copy link
Collaborator

jkbk2004 commented Nov 3, 2025

@MatthewPyle-NOAA can you rsync /lfs/h2/emc/lam/noscrub/emc.lam/RRFS.v1_RT/NEMSfv3gfs/input-data-20221101/FV3_input_data_conus13km/INPUT/sfc_data.nc to somewhere hera/ursa space: scratch3/4? It looks like new variable is added in the file: acsnow_land. I need to update on hera/hercules/orion side.

@MatthewPyle-NOAA
Copy link
Collaborator

@jkbk2004 I mentioned the updated sfc_data.nc file a few days ago "I also put an updated sfc_data.nc file into /scratch4/NCEPDEV/fv3-cam/Matthew.Pyle/ that belongs in the input-data-20221101/FV3_input_data_conus13km/INPUT/ space. Hope this helps."

@jkbk2004
Copy link
Collaborator

@MatthewPyle-NOAA @JiliDong-NOAA hera is going to be decommissioned soon. And this production branch is based on spack stack 1.5 (way old) and scratch1. We need to skip test on hera. And test on hercules passes. You can move on for merging. Let me know how RRFS team want to follow up hera issue. In my opinion, best solution is to create v1.1 production branch out of develop. So we can seamlessly support ursa and spack stack 1.9.2 and avoid divergence issue. @BrianCurtis-NOAA FYI

@BrianCurtis-NOAA
Copy link
Collaborator

@MatthewPyle-NOAA @JiliDong-NOAA hera is going to be decommissioned soon. And this production branch is based on spack stack 1.5 (way old) and scratch1. We need to skip test on hera. And test on hercules passes. You can move on for merging. Let me know how RRFS team want to follow up hera issue. In my opinion, best solution is to create v1.1 production branch out of develop. So we can seamlessly support ursa and spack stack 1.9.2 and avoid divergence issue. @BrianCurtis-NOAA FYI

If I understand this correctly, there are no issue with WCOSS2 and only issues on other machines. As this is a production only PR, it makes sense to me that the only important tests are the ones on WCOSS2.

I'll also reiterate what Grant mentioned earlier: Are there any of these changes that are not already in develop branch of UFSWM?

@JiliDong-NOAA
Copy link
Contributor Author

is there anything we can do to move this PR forward?

@gspetro-NOAA
Copy link
Collaborator

@jkbk2004 What's the plan for this PR? Do we need additional info from @JiliDong-NOAA or others?

@jkbk2004
Copy link
Collaborator

@MatthewPyle-NOAA @JiliDong-NOAA I pushed hercules test log. you can go ahead to start merging this pr. This production branch is not supportable on hera. A plan is recommended to support the test on Ursa. Spack stack version should be updated.

@MatthewPyle-NOAA
Copy link
Collaborator

@jkbk2004 I haven't merged anything in so long I don't really remember the sequence of steps, so may need some guidance from you.

@MatthewPyle-NOAA
Copy link
Collaborator

@jkbk2004 I'd still like to wrap this merge up, but could use a little help on the proper steps to take. Thanks!

@gspetro-NOAA
Copy link
Collaborator

gspetro-NOAA commented Jan 20, 2026

@MatthewPyle-NOAA You would start with the lowest level PR(s). In this case, you could start with CCPP physics or atmos_cubed_sphere because they are at the same level, but I prefer to do one at a time. Once that PR is merged, you modify .gitmodules in the next level up (e.g., UFSATM/FV3) to point to the production branch. Then you update the subcomponent hash. For example, if you start with atmos_cubed_sphere, you'd have NOAA-EMC/GFDL_atmos_cubed_sphere#91 merged into production/RRFS.v1. Then you'd update the UFSATM .gitmodules file and update the atmos_cubed_sphere hash:

cd ufs-weather-model/FV3/atmos_cubed_sphere 
git remote add upstream https://github.com/NOAA-EMC/GFDL_atmos_cubed_sphere
git fetch upstream 
git checkout upstream/production/RRFS.v1
cd ..
git add atmos_cubed_sphere
git commit -m "<commit_message_here"
git push origin production/RRFS.v1

You'll probably have to check the actual paths--it looks like the PR might still be using FV3 instead of UFSATM, so adjust the paths accordingly. I tried to do it here, but I might've missed something (develop uses UFSATM/fv3 instead of FV3).

You'd repeat the process for CCPP and then move up a level to merge the UFSATM and stochastic physics PRs.

@MatthewPyle-NOAA MatthewPyle-NOAA self-requested a review January 20, 2026 17:40
Copy link
Collaborator

@MatthewPyle-NOAA MatthewPyle-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, but still want the testing file (rrfs_for_testing_do_not_merge.conf) removed before merging.

@jkbk2004
Copy link
Collaborator

@JiliDong-NOAA @MatthewPyle-NOAA new hash for the stochastic physics production branch is NOAA-PSL/stochastic_physics@e8d56dd

Copy link
Collaborator

@MatthewPyle-NOAA MatthewPyle-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for helping to wrap this one up at long last, @JiliDong-NOAA

@MatthewPyle-NOAA
Copy link
Collaborator

Do the changes represented in this PR need to make their way back to the develop branch?

@grantfirl Sorry for not answering this question months ago. I think these changes would be nice to have in develop, as they make the code more robust (able to restart and give bit-wise identical answers, etc.). Would you agree, @JiliDong-NOAA?

@MatthewPyle-NOAA
Copy link
Collaborator

@BrianCurtis-NOAA or @jkbk2004 , this one needs a second approval if either are you are comfortable with it.

@BrianCurtis-NOAA
Copy link
Collaborator

All changes make sense, and I trust that RRFS team has put a lot of work into verifying that no new bugs come from these changes in the develop branch of the UFS-Weather-Model or will be putting that work in ASAP.

@MatthewPyle-NOAA MatthewPyle-NOAA merged commit 8b73ad3 into ufs-community:production/RRFS.v1 Jan 20, 2026
1 check passed
@github-project-automation github-project-automation bot moved this from Schedule to Done in PRs to Process Jan 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A3S There are changes to the atmos_cubed_sphere component repository. CCPP There are changes to a CCPP repository. No Baseline Change No Baseline Change UFSATM There are changes to the UFSATM repository.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

8 participants