-
Notifications
You must be signed in to change notification settings - Fork 276
[production/RRFS.v1] fix RRFS/REFS restart reproducibility and DEBUG crash issues for RRFSv1 operational implementation #2925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[production/RRFS.v1] fix RRFS/REFS restart reproducibility and DEBUG crash issues for RRFSv1 operational implementation #2925
Conversation
Rrfs v1 conus13km tests
* update for rrfsens restart bitwise reproducibility --------- Co-authored-by: jili dong <[email protected]> Co-authored-by: jili dong <[email protected]> Co-authored-by: jili dong <[email protected]>
REFS ensemble restart fix (ufs-community#43)
|
@jkbk2004 @BrianCurtis-NOAA Do either of you need more information before sanity testing this production branch PR? |
|
@MatthewPyle-NOAA Is there any specific testing this branch uses? I can't recall if you run the full suite on WCOSS2 and/or any other system, or rely on another testing system. |
|
@BrianCurtis-NOAA We typically have run the rt.conf_rrfs tests for this branch. |
I don't see any evidence that these were run. @JiliDong-NOAA were tests run on any machine, yet? Is there a machine not named WCOSS2 that you prefer rt.conf_rrfs is run on? |
|
@MatthewPyle-NOAA @BrianCurtis-NOAA This production branch is really getting diverged from develop branch: spack stack and new machine, etc. Note that this branch is based on spack stack 1.5 and there is decommission plan for hera: around Feb or Spring time. We are already using Ursa. Quite some work to sync between develop and this branch. If possible, optimal option might be recreating a production branch. |
|
@MatthewPyle-NOAA can you rsync /lfs/h2/emc/lam/noscrub/emc.lam/RRFS.v1_RT/NEMSfv3gfs/input-data-20221101/FV3_input_data_conus13km/INPUT/sfc_data.nc to somewhere hera/ursa space: scratch3/4? It looks like new variable is added in the file: acsnow_land. I need to update on hera/hercules/orion side. |
|
@jkbk2004 I mentioned the updated sfc_data.nc file a few days ago "I also put an updated sfc_data.nc file into /scratch4/NCEPDEV/fv3-cam/Matthew.Pyle/ that belongs in the input-data-20221101/FV3_input_data_conus13km/INPUT/ space. Hope this helps." |
|
@MatthewPyle-NOAA @JiliDong-NOAA hera is going to be decommissioned soon. And this production branch is based on spack stack 1.5 (way old) and scratch1. We need to skip test on hera. And test on hercules passes. You can move on for merging. Let me know how RRFS team want to follow up hera issue. In my opinion, best solution is to create v1.1 production branch out of develop. So we can seamlessly support ursa and spack stack 1.9.2 and avoid divergence issue. @BrianCurtis-NOAA FYI |
If I understand this correctly, there are no issue with WCOSS2 and only issues on other machines. As this is a production only PR, it makes sense to me that the only important tests are the ones on WCOSS2. I'll also reiterate what Grant mentioned earlier: Are there any of these changes that are not already in develop branch of UFSWM? |
|
is there anything we can do to move this PR forward? |
|
@jkbk2004 What's the plan for this PR? Do we need additional info from @JiliDong-NOAA or others? |
|
@MatthewPyle-NOAA @JiliDong-NOAA I pushed hercules test log. you can go ahead to start merging this pr. This production branch is not supportable on hera. A plan is recommended to support the test on Ursa. Spack stack version should be updated. |
|
@jkbk2004 I haven't merged anything in so long I don't really remember the sequence of steps, so may need some guidance from you. |
|
@jkbk2004 I'd still like to wrap this merge up, but could use a little help on the proper steps to take. Thanks! |
|
@MatthewPyle-NOAA You would start with the lowest level PR(s). In this case, you could start with CCPP physics or atmos_cubed_sphere because they are at the same level, but I prefer to do one at a time. Once that PR is merged, you modify You'll probably have to check the actual paths--it looks like the PR might still be using FV3 instead of UFSATM, so adjust the paths accordingly. I tried to do it here, but I might've missed something (develop uses UFSATM/fv3 instead of FV3). You'd repeat the process for CCPP and then move up a level to merge the UFSATM and stochastic physics PRs. |
MatthewPyle-NOAA
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving, but still want the testing file (rrfs_for_testing_do_not_merge.conf) removed before merging.
|
@JiliDong-NOAA @MatthewPyle-NOAA new hash for the stochastic physics production branch is NOAA-PSL/stochastic_physics@e8d56dd |
MatthewPyle-NOAA
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for helping to wrap this one up at long last, @JiliDong-NOAA
@grantfirl Sorry for not answering this question months ago. I think these changes would be nice to have in develop, as they make the code more robust (able to restart and give bit-wise identical answers, etc.). Would you agree, @JiliDong-NOAA? |
|
@BrianCurtis-NOAA or @jkbk2004 , this one needs a second approval if either are you are comfortable with it. |
|
All changes make sense, and I trust that RRFS team has put a lot of work into verifying that no new bugs come from these changes in the develop branch of the UFS-Weather-Model or will be putting that work in ASAP. |
8b73ad3
into
ufs-community:production/RRFS.v1
Commit Queue Requirements:
test_changes.listindicates which tests, if any, are changed by this PR. Committest_changes.list, even if it is empty.Description:
This PR is from @DusanJovic-NOAA and @JiliDong-NOAA and it fixes RRFS/REFS restart bitwise reproducibility issues caused by:
It also fixes crash when running REFS under DEBUG mode
The issues are related to LSM-SPP. It appears that LSM-SPP perturbations were added to the whole domain without masking out the water/ice points. This caused:
The forecast will only change when Grell-Freitas is turned on during warm start runs with gf_coldstart being explicitly set to T in the namelist
This PR also includes a hook to output surface specific humidity, which may be needed for RRFS post-processing.
The PR address issue #2926
Commit Message:
Priority:
Git Tracking
UFSWM:
Sub component Pull Requests:
UFSWM Blocking Dependencies:
Documentation:
Changes
Regression Test Changes (Please commit test_changes.list):
Input data Changes:
Library Changes/Upgrades:
Testing Log: