Bug fixes for ccpp_prebuild to work with partially case-insensitive capgen parser #669

climbfuji · 2025-07-25T19:21:25Z

Bug fixes for ccpp_prebuild to work with partially case-insensitive capgen parser

These updates are needed to make ccpp_prebuild.py work with the recent, partially complete case-insensitive capgen parser. I tested this with NEPTUNE in a rather complicated way - pulling develop into the branch neptune uses (that is based on main), then creating the bug fixes there, then cherry-picking them so that we can merge them into develop here. Hopefully, by the time this comes all back to NEPTUNE it will still work :-)

This PR needs to be merged into develop, then #668 must be updated before it can be merged into main.

User interface changes?: no - but prebuild is now case-insensitive

Fixes: no separate issue created, see discussion in #668

Testing:
test removed: none
unit tests: all pass
system tests: all pass
manual testing: full regression testing with NEPTUNE underway; would like to see UFS testing, too (@dustinswales?)

…e capgen parser

scripts/mkstatic.py

dustinswales · 2025-07-25T22:05:59Z

@climbfuji I can't test this since #668 is not working in the UFS :(.

mwaxmonsky

Just a few design/python questions.

scripts/ccpp_prebuild.py

scripts/common.py

gold2718

Mostly okay, just a minor Fortran nit.

scripts/mkstatic.py

climbfuji · 2025-07-28T11:22:02Z

@climbfuji I can't test this since #668 is not working in the UFS :(.

This is the whole point of this PR. Use #668 as a base and pull in the changes from here (#669).

…r-to-lowercase conversion

scripts/common.py

dustinswales · 2025-07-28T17:04:15Z

@climbfuji I can't test this since #668 is not working in the UFS :(.

This is the whole point of this PR. Use #668 as a base and pull in the changes from here (#669).

Facepalm
Testing now

climbfuji · 2025-07-28T19:37:03Z

@dustinswales I found an interesting problem with the capgen / prebuild updates. We need to look at how the CCPP_interstitial DDTs are allocated. The implementation in NEPTUNE at least still relies in parts on the old blocked data structures, and this breaks when more than one thread is used, now that horizontal_loop_extent is no longer allowed in the host model.

dustinswales · 2025-07-28T22:53:58Z

@dustinswales I found an interesting problem with the capgen / prebuild updates. We need to look at how the CCPP_interstitial DDTs are allocated. The implementation in NEPTUNE at least still relies in parts on the old blocked data structures, and this breaks when more than one thread is used, now that horizontal_loop_extent is no longer allowed in the host model.

@climbfuji That's interesting. Not sure if I understand the details completely. I'm stuck on something else....

I (think) I got through all of the metadata changes I needed, but I'm running into a new error when building:
ld: physics/libccpp_physics.a(ccpp_fv3_gfs_v17_coupled_p8_phys_ps_cap.F90.o): relocation R_X86_64_32S against symbol `ccpp_fv3_gfs_v17_coupled_p8_phys_ps_cap_mp_initialized_' can not be used when making a shared object; recompile with -fPIC

I think this has something to do with case sensitivity, but I haven't figured out all the details yet.

climbfuji · 2025-07-28T22:58:17Z

@dustinswales I found an interesting problem with the capgen / prebuild updates. We need to look at how the CCPP_interstitial DDTs are allocated. The implementation in NEPTUNE at least still relies in parts on the old blocked data structures, and this breaks when more than one thread is used, now that horizontal_loop_extent is no longer allowed in the host model.

@climbfuji That's interesting. Not sure if I understand the details completely. I'm stuck on something else....

I (think) I got through all of the metadata changes I needed, but I'm running into a new error when building: ld: physics/libccpp_physics.a(ccpp_fv3_gfs_v17_coupled_p8_phys_ps_cap.F90.o): relocation R_X86_64_32S against symbol `ccpp_fv3_gfs_v17_coupled_p8_phys_ps_cap_mp_initialized_' can not be used when making a shared object; recompile with -fPIC

I think this has something to do with case sensitivity, but I haven't figured out all the details yet.

You need to update the calling CMakeLists.txt that includes ccpp-framework to build a static library. I am doing this in NEPTUNE:

set(BUILD_SHARED_LIBS OFF)
add_subdirectory(ccpp-framework)

I think this is safe for the UFS, too. But if you have something else setting this variable higher up, then you need this:

set(BUILD_SHARED_LIBS_SAVE ${BUILD_SHARED_LIBS})
set(BUILD_SHARED_LIBS OFF)
add_subdirectory(ccpp-framework)
set(BUILD_SHARED_LIBS ${BUILD_SHARED_LIBS_SAVE})

climbfuji · 2025-07-28T23:03:58Z

Another update. I got the UFS code to run with the updated GFS_interstitial DDT, it is b4b between omp=1 and omp=2. I still need to check on memory footprint and performance, but at least I have a working solution now that the capgen parser refuses horizontal_loop_extent in the host model metadata's horizontal dimensions.

climbfuji · 2025-07-28T23:09:45Z

Another update. I got the UFS code to run with the updated GFS_interstitial DDT, it is b4b between omp=1 and omp=2. I still need to check on memory footprint and performance, but at least I have a working solution now that the capgen parser refuses horizontal_loop_extent in the host model metadata's horizontal dimensions.

Good news is that so far, no further changes required to ccpp-framework (i.e. this PR).

mwaxmonsky · 2025-07-28T23:12:24Z

@climbfuji Would it make sense to change the frameworks' BUILD_SHARED_LIBS option to CCPP_BUILD_SHARED_LIBS?

option(CCPP_BUILD_SHARED_LIBS "Build using shared libraries" ON)

set(BUILD_SHARED_LIBS ${CCPP_BUILD_SHARED_LIBS})

Then in the parent level CMake, we can do:

set(CCPP_BUILD_SHARED_LIBS OFF) # Or comment out if default for the framework is fine 
add_subdirectory(ccpp-framework)

If I understand CMake, this way there shouldn't be a need to save the parent projects environment variable.

climbfuji · 2025-07-29T12:09:18Z

@climbfuji Would it make sense to change the frameworks' BUILD_SHARED_LIBS option to CCPP_BUILD_SHARED_LIBS?
option(CCPP_BUILD_SHARED_LIBS "Build using shared libraries" ON)

set(BUILD_SHARED_LIBS ${CCPP_BUILD_SHARED_LIBS})
Then in the parent level CMake, we can do:
set(CCPP_BUILD_SHARED_LIBS OFF) # Or comment out if default for the framework is fine 
add_subdirectory(ccpp-framework)
If I understand CMake, this way there shouldn't be a need to save the parent projects environment variable.

It's no problem at all to make the code work with the current name. I think it is actually better to not prefix every variable with CCPP_. If we were to do this for every component that goes into a model, we'd be dealing with tens or more of variables to set. For instance, if we were to also do this for OPENMP, then, in order to turn on OpenMP, we would have to set UFS_OPENMP, CCPP_OPENMP, MOM6_OPENMP, ... instead of just OPENMP`.

HDF5 for example is also not prefixing every CMake variable with HDF5_ ...

mkavulich

@climbfuji I'll be on PTO until next Tuesday, so once Dustin gives his approval feel free to merge this yourself if I'm not back yet.

climbfuji · 2025-08-19T12:57:07Z

@dustinswales I feel like we should merge this given that all but one of your UFS tests pass with b4b identical results and that the one remaining test may differ because of a compiler optimization or something else UFS-related. Not merging this PR is holding back the update of main from develop, which in turn holds back updating NEPTUNE and other models. If really needed, we can always apply another bug fix to the develop branch of ccpp-framework?

…arser

dustinswales · 2025-08-19T19:21:19Z

@climbfuji Merging this in is fine with me.
I still haven't found the cause of the one UFS RT that is failing...

climbfuji · 2025-08-19T19:23:26Z

@climbfuji Merging this in is fine with me. I still haven't found the cause of the one UFS RT that is failing...

Can you remind us again what "failing" means (we didn't quite remember yesterday in the meeting). Is it that the tests are b4b different? If so, for release builds only, or also for debug builds? Or is the code crashing?

dustinswales · 2025-08-20T16:46:14Z

@climbfuji The RRTMGP test is not b4b, release build only no debug test for GP.
The test runs to completion, but answers change after the first time step. I spent the better half of last week on this and cannot find a reason.

climbfuji · 2025-08-20T16:54:07Z

@climbfuji The RRTMGP test is not b4b, release build only no debug test for GP. The test runs to completion, but answers change after the first time step. I spent the better half of last week on this and cannot find a reason.

Ok, I recall suggesting last week to run the RRTMGP test in DEBUG mode for the current develop branch and for your up-to-date branch. If the results match between the two, then I think you can be fairly certain that this is because of the compiler optimization.

climbfuji · 2025-08-20T16:55:54Z

And I am very certain that the changes here that deal with case-sensitivity / case-insensitivity) have nothing to do with the RRTMGP b4b differences ...

dustinswales · 2025-08-20T17:01:59Z

Ok, I recall suggesting last week to run the RRTMGP test in DEBUG mode for the current develop branch and for your up-to-date branch. If the results match between the two, then I think you can be fairly certain that this is because of the compiler optimization.

I'm looking into this now.

dustinswales · 2025-08-20T17:05:50Z

And I am very certain that the changes here that deal with case-sensitivity / case-insensitivity) have nothing to do with the RRTMGP b4b differences ...

This is true.
The allocating/resetting/cleanup of the interstitial type on the host side is where the problem arises. If I keep all the host-side changes and revert the framework hash, I get the same results.
Something about allocating/resetting/cleanup of the interstitial is not behaving the same as before?

climbfuji · 2025-08-20T17:09:38Z

And I am very certain that the changes here that deal with case-sensitivity / case-insensitivity) have nothing to do with the RRTMGP b4b differences ...

This is true. The allocating/resetting/cleanup of the interstitial type on the host side is where the problem arises. If I keep all the host-side changes and revert the framework hash, I get the same results. Something about allocating/resetting/cleanup of the interstitial is not behaving the same as before?

Oh wow. Can you diff the auto-generated files? Probably have to convert everything to lowercase using tr or so to be able to do that.

climbfuji · 2025-08-20T17:16:02Z

And I am very certain that the changes here that deal with case-sensitivity / case-insensitivity) have nothing to do with the RRTMGP b4b differences ...

This is true. The allocating/resetting/cleanup of the interstitial type on the host side is where the problem arises. If I keep all the host-side changes and revert the framework hash, I get the same results. Something about allocating/resetting/cleanup of the interstitial is not behaving the same as before?

Oh wow. Can you diff the auto-generated files? Probably have to convert everything to lowercase using tr or so to be able to do that.

~~maybe those rrtmgp ddts inside the interstitial ddt don't get cleaned up correctly?~~ scratch that, then reverting the ccpp-f hash wouldn't help

dustinswales · 2025-08-20T17:28:55Z

@climbfuji I diff'd the Caps and the only change was from 1:IM -> ixs:ixe.
Also, we don't use the RRTMGP DDTS as interstitials anymore in the UWM. I know CAM-SIMA wants to (@peverwhee and #674)

dustinswales · 2025-08-20T18:49:43Z

Ok, I recall suggesting last week to run the RRTMGP test in DEBUG mode for the current develop branch and for your up-to-date branch. If the results match between the two, then I think you can be fairly certain that this is because of the compiler optimization.

I'm looking into this now.

@climbfuji Differences occur in DEBUG mode. Snap.

climbfuji · 2025-08-20T20:06:47Z

@climbfuji I diff'd the Caps and the only change was from 1:IM -> ixs:ixe. Also, we don't use the RRTMGP DDTS as interstitials anymore in the UWM. I know CAM-SIMA wants to (@peverwhee and #674)

For all phases except the run phase, the correct indices for the UFS are 1:GFS_control%ncols. Can you confirm that ixs=1 and ixe=IM=GFS_control%ncols`` in these cases? For the run phase, you would expect ixs:ixe` to be

Model%chunk_begin(ib):Model%chunk_end(ib)

for the different ib (1 to nblocks). Probably worth printing out these indices and also checking if there is anything in the RRTMG CCPP scheme entry points could cause an inconsistency with respect to the ranges.

dustinswales · 2025-08-20T22:05:01Z

@climbfuji I diff'd the Caps and the only change was from 1:IM -> ixs:ixe. Also, we don't use the RRTMGP DDTS as interstitials anymore in the UWM. I know CAM-SIMA wants to (@peverwhee and #674)

For all phases except the run phase, the correct indices for the UFS are 1:GFS_control%ncols. Can you confirm that ixs=1 and ixe=IM=GFS_control%ncols in these cases? For the run phase, you would expect ``ixs:ixe` to be
Model%chunk_begin(ib):Model%chunk_end(ib)
for the different ib (1 to nblocks). Probably worth printing out these indices and also checking if there is anything in the RRTMG CCPP scheme entry points could cause an inconsistency with respect to the ranges.

Yeah, I've checked all of these things w.o any success.
I've been all over the GP interface and it doesn't do anything different than any of the other schemes wrt indexing.
The same chunked data as before is passing through the Caps, and all the schemes are fine with it except for GP?
I have no clue what's going on.

climbfuji · 2025-08-20T22:10:10Z

Then I suggest merging and continuing the investigation with the (soon) updated PR #668

climbfuji added 2 commits July 25, 2025 13:20

Update ccpp_prebuild scripts and modules to work with case-insensitiv…

1c66517

…e capgen parser

Also fix scripts/ccpp_track_variables.py

4f244a4

climbfuji mentioned this pull request Jul 25, 2025

Update main from develop 2025/07/24 --> 2025/08/21 #668

Merged

5 tasks

climbfuji added 2 commits July 25, 2025 13:25

Update test/unit_tests/test_common.py

4df4962

Reinstantiate inadvertently deleted code in mkstatic.py

152d84f

climbfuji marked this pull request as ready for review July 25, 2025 19:50

climbfuji requested review from dustinswales, gold2718, grantfirl, mkavulich, mwaxmonsky and peverwhee as code owners July 25, 2025 19:50

climbfuji changed the title ~~DRAFT Bugfix/prebuild case insensitive capgen parser~~ DRAFT Bug fixes for ccpp_prebuild to work with partially case-insensitive capgen parser Jul 25, 2025

climbfuji commented Jul 25, 2025

View reviewed changes

scripts/mkstatic.py Show resolved Hide resolved

climbfuji self-assigned this Jul 25, 2025

climbfuji added the bugfix Fix for issue with 'bug' label. label Jul 25, 2025

climbfuji added this to the capgen unification milestone Jul 25, 2025

mwaxmonsky reviewed Jul 26, 2025

View reviewed changes

scripts/ccpp_prebuild.py Show resolved Hide resolved

scripts/common.py Show resolved Hide resolved

gold2718 requested changes Jul 26, 2025

View reviewed changes

scripts/mkstatic.py Outdated Show resolved Hide resolved

Apply suggestions to code review. Replace hard-coded literal for uppe…

0db01c9

…r-to-lowercase conversion

climbfuji changed the title ~~DRAFT Bug fixes for ccpp_prebuild to work with partially case-insensitive capgen parser~~ Bug fixes for ccpp_prebuild to work with partially case-insensitive capgen parser Jul 28, 2025

climbfuji requested a review from gold2718 July 28, 2025 13:04

gold2718 approved these changes Jul 28, 2025

View reviewed changes

peverwhee approved these changes Jul 28, 2025

View reviewed changes

climbfuji commented Jul 28, 2025

View reviewed changes

scripts/common.py Outdated Show resolved Hide resolved

Update scripts/common.py

2b62be3

mkavulich approved these changes Jul 30, 2025

View reviewed changes

mkavulich added the do not merge label Aug 6, 2025

Merge branch 'develop' into bugfix/prebuild_case_insensitive_capgen_p…

b111b86

…arser

climbfuji removed the do not merge label Aug 21, 2025

climbfuji merged commit 4ae528b into NCAR:develop Aug 21, 2025
19 checks passed

climbfuji deleted the bugfix/prebuild_case_insensitive_capgen_parser branch August 21, 2025 18:06

Bug fixes for ccpp_prebuild to work with partially case-insensitive capgen parser #669

Bug fixes for ccpp_prebuild to work with partially case-insensitive capgen parser #669

Uh oh!

Conversation

climbfuji commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

dustinswales commented Jul 25, 2025

Uh oh!

mwaxmonsky left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gold2718 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

climbfuji commented Jul 28, 2025

Uh oh!

Uh oh!

dustinswales commented Jul 28, 2025

Uh oh!

climbfuji commented Jul 28, 2025

Uh oh!

dustinswales commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

climbfuji commented Jul 28, 2025

Uh oh!

climbfuji commented Jul 28, 2025

Uh oh!

climbfuji commented Jul 28, 2025

Uh oh!

mwaxmonsky commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

climbfuji commented Jul 29, 2025

Uh oh!

mkavulich left a comment

Choose a reason for hiding this comment

Uh oh!

climbfuji commented Aug 19, 2025

Uh oh!

dustinswales commented Aug 19, 2025

Uh oh!

climbfuji commented Aug 19, 2025

Uh oh!

dustinswales commented Aug 20, 2025

Uh oh!

climbfuji commented Aug 20, 2025

Uh oh!

climbfuji commented Aug 20, 2025

Uh oh!

dustinswales commented Aug 20, 2025

Uh oh!

dustinswales commented Aug 20, 2025

Uh oh!

climbfuji commented Aug 20, 2025

Uh oh!

climbfuji commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dustinswales commented Aug 20, 2025

Uh oh!

dustinswales commented Aug 20, 2025

Uh oh!

climbfuji commented Aug 20, 2025 • edited by dustinswales Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dustinswales commented Aug 20, 2025

Uh oh!

climbfuji commented Aug 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

climbfuji commented Jul 25, 2025 •

edited

Loading

dustinswales commented Jul 28, 2025 •

edited

Loading

mwaxmonsky commented Jul 28, 2025 •

edited

Loading

climbfuji commented Aug 20, 2025 •

edited

Loading

climbfuji commented Aug 20, 2025 •

edited by dustinswales

Loading