-
Notifications
You must be signed in to change notification settings - Fork 25
Modify gsibec workarounds to fix bug with linear variable change #527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Modify gsibec workarounds to fix bug with linear variable change #527
Conversation
|
FAILED on hera started build_and_test on hera at UTC time: Wed Jan 28 02:27:11 UTC 2026 workdir: /scratch3/NCEPDEV/fv3-cam/rrfsbot/PRs_RDASApp/527 |
|
PASSED on wcoss2 started build_and_test on wcoss2 at UTC time: Wed Jan 28 02:22:36 UTC 2026 workdir: /lfs/h2/emc/da/noscrub/samuel.degelia/rrfsbot/PRs_RDASApp/527 |
|
Converting to draft. For some reason it looks like this change is actually causing NaNs in the cost function on Hera but resolving them on WCOSS2... |
ShunLiu-NOAA
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized some ctests did not passed, so I am reverting my approval.
|
It looks like the Hera failures occur due to NaNs now appearing on the second outer loop. So these changes are resolving NaNs during the first outer loop for the |
|
Yes, the nan values in cost function could come from the nan values or values in background fields from undefined behavior (when compiling optimization is turned on) on filtering grids outside of the model domain. That could occur earlier in the regional gsibec because of the lateral boundary points , even earlier than @Masanori-NOAA once suspected. For example, the ges_prsl calculation in guess_grids.f90 of gsibec. I am still trying to figure out a simple way to deal with situation. I would focus on this issue, in collaboration with Masanori and colleagues after I finish the current optimization of MGBF codes. |
|
Thanks, @TingLei-NOAA! |
|
I added some debug prints to track down the source of the NaNs in the minimizer. Tracing through various layers, I found that the NaNs originate in As a simple hardening step, I added additional checks in After changing this if-block, 3dvar now runs full thoroughly on Hera. The minimization results are slightly different though after this change (e.g., different reduction of residual norm). I am going to make some plots to see how similar the analyses are and if this fix is okay. |
|
3dvar run through after the above changes but the analyses are very different. Going to continue debugging. |
Description
This PR modifies the workarounds for gsibec to force zero for the outer analysis grids in linear variable change section. This method no longer need to fill the background values in missing values analysis grids.
These modifications are needed to prevent nans in the cost function when running 3dvar on the
na3kmdomain. Note that we are still seeing some issues with nans that can be prevented by limiting 3dvar to only a single outer loop. There will likely be more changes coming to the gsibec code to resolve this. But for now, this PR allows us to at least run one outer loop and start getting results.Huge thanks to @Masanori-NOAA for debugging this problem and finding a (at least partial) solution.
Issue(s) addressed
None
Dependencies (if applicable)
None
Checklist