-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ML correction on CPU and GPU #2909
Conversation
This reverts commit b4d7445.
This commit fixes issues with the implementation of precipitation adjustment in ML when running on GPU's. Additionally this commit turns on property checks to ensure that ML cannot produce an unrealistic state.
This reverts commit fcb5ee3.
Option to do ML correction on temperature only
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
Using Repos:
Pull Request Author: elynnwu |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
SCREAM_PullRequest_Autotester_Mappy # 5695 PASSED (click to see last 100 lines of console output)
SCREAM_PullRequest_Autotester_Weaver # 5931 FAILED (click to see last 100 lines of console output)
|
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
Using Repos:
Pull Request Author: elynnwu |
NOTICE: The AutoTester has encountered an internal error (usually a Communications Timeout), testing will be restarted, previous tests may still be running but will be ignored by the AutoTester... |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
Using Repos:
Pull Request Author: elynnwu |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, target_sha=6891ee3cb825adf849cab1238f2f6fc7bbc3217d, However Inspection must be performed before merge can occur... |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
Using Repos:
Pull Request Author: elynnwu |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
|
All Jobs Finished; status = PASSED, target_sha=308996be7151e6b08edf5c9a3d2e7925a001a806, However Inspection must be performed before merge can occur... |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
Using Repos:
Pull Request Author: elynnwu |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
|
All Jobs Finished; status = PASSED, target_sha=5e7b019a3a529d817a464557e473fecb2a2b67d0, However Inspection must be performed before merge can occur... |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
Using Repos:
Pull Request Author: elynnwu |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
|
All Jobs Finished; status = PASSED, target_sha=9d845ad6c53611729bee876309f2c006c81c2493, However Inspection must be performed before merge can occur... |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
Using Repos:
Pull Request Author: elynnwu |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
SCREAM_PullRequest_Autotester_Mappy # 5735 FAILED (click to see last 100 lines of console output)
SCREAM_PullRequest_Autotester_Weaver # 5966 PASSED (click to see last 100 lines of console output)
|
@@ -5,6 +5,9 @@ include (${EKAT_MACH_FILES_PATH}/kokkos/amd-zen3.cmake) | |||
include (${EKAT_MACH_FILES_PATH}/kokkos/openmp.cmake) | |||
|
|||
set(CMAKE_CXX_FLAGS "-DTHRUST_IGNORE_CUB_VERSION_CHECK" CACHE STRING "" FORCE) | |||
set(PYBIND11_PYTHON_VERSION 3.9 CACHE STRING "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elynnwu or @frodre , I think there is consensus that we can merge this in as long as we put guard rails on the environment call. Can you add a compiler flag to turn these options "OFF" by default and only "ON" when the user specifies they want ML? I would be in favor of making the flag explicit in name, like FV3NET
or CorrectiveML
. You would want to add the flags to both gpu and cpu config files.
@mahf708 have I characterized what needs to be done correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, something like should be okay:
option (SCREAM_ENABLE_ML_CORRECTION "Whether to enable ML correction parametrization" OFF)
if (SCREAM_ENABLE_ML_CORRECTION)
set(PYBIND11_PYTHON_VERSION 3.9 CACHE STRING "")
endif()
You can add whatever you want inside the guarded if–endif.
To activate SCREAM_ENABLE_ML_CORRECTION
, you can add it to the scream configs in a run script (with ./xmlchange SCREAM_CMAKE_OPTIONS="... SCREAM_ENABLE_ML_CORRECTION ON ..."
) or directly into the cmake call (if you're building this at a lower level) with -DSCREAM_ENABLE_ML_CORRECTION=ON
or the like
Closing this until we finalize the ML approach |
In this PR, we add the capability of running ML correction on GPU. The call to the python code differs due to how pybind11 handles the exchange. On cpu, we continue to rely on pybind11 to pass the array pointer using its integration with numpy. On gpu, we pass the pointer then rebuild it as a cupy array on the python side (gpu allows unmanaged memory access). By doing so, the rest of the code is unchanged since xarray can work with both numpy and cupy arrays. As a result, the actual calls to do ML correction is identical between cpu and gpu.
We also introduce a few features in this PR:
sfc_flux
as well assfc_flux_sw_net
andsfc_flux_sw_dn
We have also started focusing on using perlmutter cpu and gpu as our main machine for ML corrective work. A shared python env is now maintained at:
/global/common/software/m4492/fv3net-shared-py39