Skip to content

aigefs: add a new component and prep scripts#915

Open
GwenChen-NOAA wants to merge 4 commits intoNOAA-EMC:feature/add_aifrom
GwenChen-NOAA:aigefs
Open

aigefs: add a new component and prep scripts#915
GwenChen-NOAA wants to merge 4 commits intoNOAA-EMC:feature/add_aifrom
GwenChen-NOAA:aigefs

Conversation

@GwenChen-NOAA
Copy link
Contributor

Description of Changes

This PR adds a new component aigefs to EVS and its associated prep scripts.

Developer Questions and Checklist

  • Is this a high priority PR? If so, why and is there a date it needs to be merged by? Yes.
  • Do you have any planned upcoming annual leave/PTO? Yes, on 3/6.
  • Are there any changes needed in the times when the jobs are supposed to run/kick-off? A new job is being added.
  • The code changes follow NCO's EE2 Standards.
  • Developer's name is removed throughout the code and have used ${USER} where necessary throughout the code.
  • References the feature branch for HOMEevs are removed from the code.
  • J-Job environment variables, COMIN and COMOUT directories, and output follow what has been defined for EVS.
  • Jobs over 15 minutes in runtime have restart capability.
  • If applicable, changes in the dev/drivers/scripts or dev/modulefiles have been made in the corresponding ecf/scripts and ecf/defs/evs-nco.def?
  • Jobs contain the appropriate file checking and don't run METplus for any missing data.
  • Code is using METplus wrappers structure and not calling MET executables directly.
  • Log is free of any ERRORs or WARNINGs.

Testing Instructions

(1) Set up jobs
a. Symlink the EVS_fix directory locally as "fix".
b. In the driver scripts, edit the following environment variables:

HOMEevs - set to your test EVS directory
COMIN - set to /lfs/h2/emc/vpppg/noscrub/emc.vpppg/${NET}/$evs_ver_2d
COMOUT - set to your test output directory
KEEPDATA - set to YES

(2) Run the atmos prep job
Run the following job in dev/drivers/scripts/prep/aigefs for any VDATE:
qsub jevs_prep_aigefs_atmos.sh

After the job completed, log file should be free of errors or warnings, and the output atmos.YYYYMMDD should be the same as that under /lfs/h2/emc/vpppg/noscrub/Lichuan.Chen/evs/v2.0/prep/aigefs.

Comment on lines +7 to +8
# 1. Retrive/regrid analysis/observational data (1 degree and 1.5 degree for
# WMO).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GwenChen-NOAA Will WMO verification be included in aigefs?

CC @malloryprow @AndrewBenjamin-NOAA

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in the near future. The comment for WMO grid has removed.


export COMIN=/lfs/h2/emc/vpppg/noscrub/${USER}/$NET/$evs_ver_2d
export COMINaigefs=/lfs/h1/ops/prod/com/aigefs/${aigefs_ver}
export COMINhgefs=/lfs/h1/ops/prod/com/hgefs/${hgefs_ver}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GwenChen-NOAA Can you please remove the COMINaigefs and COMINhgefs lines from the dev driver? These two lines should be specified in the J-job only when the model output is in prod/com/. Thanks!

CC @malloryprow @AndrewBenjamin-NOAA

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted.

@AndrewBenjamin-NOAA AndrewBenjamin-NOAA self-requested a review March 3, 2026 12:24
#PBS -l place=vscatter:exclhost,select=2:ncpus=48:mem=400GB:prepost=true
#PBS -l debug=true

set -x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think cd $PBS_O_WORKDIR is needed after the set -x so that the job launches in the directory from which is was submitted?

CC: @malloryprow @AliciaBentley-NOAA

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's necessary. Global_ens driver scripts do not have it.

@malloryprow
Copy link
Contributor

Is VERIF_CASE needed to be defined for prep? I only found it defined in

  1. dev/drivers/scripts/prep/aigefs/jevs_prep_aigefs_atmos.sh‎
  2. jobs/JEVS_PREP_AIGEFS
  3. ecf/scripts/prep/aigefs/jevs_prep_aigefs_atmos.ecf

but nothing downstream in scripts/ or ush. If this isn't needed, can we remove it?

@malloryprow malloryprow added this to the EVS v2.0.z milestone Mar 3, 2026
@GwenChen-NOAA
Copy link
Contributor Author

Is VERIF_CASE needed to be defined for prep? I only found it defined in

  1. dev/drivers/scripts/prep/aigefs/jevs_prep_aigefs_atmos.sh‎
  2. jobs/JEVS_PREP_AIGEFS
  3. ecf/scripts/prep/aigefs/jevs_prep_aigefs_atmos.ecf

but nothing downstream in scripts/ or ush. If this isn't needed, can we remove it?

I think it's better to keep it because it is used in the JOBNAME, so the naming convention is consistent for all prep, stats and plots jobs.

@malloryprow
Copy link
Contributor

Is VERIF_CASE needed to be defined for prep? I only found it defined in

  1. dev/drivers/scripts/prep/aigefs/jevs_prep_aigefs_atmos.sh‎
  2. jobs/JEVS_PREP_AIGEFS
  3. ecf/scripts/prep/aigefs/jevs_prep_aigefs_atmos.ecf

but nothing downstream in scripts/ or ush. If this isn't needed, can we remove it?

I think it's better to keep it because it is used in the JOBNAME, so the naming convention is consistent for all prep, stats and plots jobs.

It looks like it is mismatched, though, with what the is being set with #PBS -N.

dev/drivers/scripts/prep/aigefs/jevs_prep_aigefs_atmos.sh has #PBS -N jevs_prep_aigefs_atmos (line 1) and export job=${PBS_JOBNAME:-jevs_${STEP}_${MODELNAME}_${VERIF_CASE}} (line 35). I think that default value for job should be jevs_${STEP}_${MODELNAME}_${RUN}

I think then VERIF_CASE becomes an unused variable and can be removed. Unless I may be missing its use elsewhere.

@GwenChen-NOAA
Copy link
Contributor Author

Is VERIF_CASE needed to be defined for prep? I only found it defined in

  1. dev/drivers/scripts/prep/aigefs/jevs_prep_aigefs_atmos.sh‎
  2. jobs/JEVS_PREP_AIGEFS
  3. ecf/scripts/prep/aigefs/jevs_prep_aigefs_atmos.ecf

but nothing downstream in scripts/ or ush. If this isn't needed, can we remove it?

I think it's better to keep it because it is used in the JOBNAME, so the naming convention is consistent for all prep, stats and plots jobs.

It looks like it is mismatched, though, with what the is being set with #PBS -N.

dev/drivers/scripts/prep/aigefs/jevs_prep_aigefs_atmos.sh has #PBS -N jevs_prep_aigefs_atmos (line 1) and export job=${PBS_JOBNAME:-jevs_${STEP}_${MODELNAME}_${VERIF_CASE}} (line 35). I think that default value for job should be jevs_${STEP}_${MODELNAME}_${RUN}

I think then VERIF_CASE becomes an unused variable and can be removed. Unless I may be missing its use elsewhere.

Ok, I removed VERIF_CASE and changed JOBNAME to jevs_${STEP}${COMPONENT}${RUN}.

@malloryprow
Copy link
Contributor

It looks like it is still lingering in JEVS_PREP_AIGEFS#L33.

I appreciate you making the changes! This helps keeps the code cleaner and more concise.

Comment on lines +94 to +98
if [ "${RUN}" == "headline" ]; then
mkdir -p $COMOUTaigefs $COMOUThgefs
elif [ "${RUN}" == "atmos" ]; then
mkdir -p $COMOUTgefs $COMOUTaigefs $COMOUThgefs $COMOUTcompleted
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GwenChen-NOAA It looks like $RUN is set to atmos in the aigefs prep job. Please remove the part of this if statement that refers to when $RUN is set to headline. If you ever add headline prep to aigefs, this if statement can be added back.

CC @malloryprow @AndrewBenjamin-NOAA

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted.

@GwenChen-NOAA
Copy link
Contributor Author

It looks like it is still lingering in JEVS_PREP_AIGEFS#L33.

I appreciate you making the changes! This helps keeps the code cleaner and more concise.

Thanks! Missed this one.

@AndrewBenjamin-NOAA
Copy link
Contributor

Thanks for the latest commit @GwenChen-NOAA. I will pull in your changes and begin testing.

CC: @malloryprow @AliciaBentley-NOAA

@AndrewBenjamin-NOAA
Copy link
Contributor

@GwenChen-NOAA The prep job has finished. The job completed successfully with errors, there were a few warnings of missing data from the emc.vpppg prep space, tied to missing ccpa data:

WARNING: /lfs/h2/emc/vpppg/noscrub/emc.vpppg/evs/v2.0/prep/aigefs/atmos.20260302/gefs/ccpa.t12z.grid3.06h.f00.grib2 is not available
WARNING: /lfs/h2/emc/vpppg/noscrub/emc.vpppg/evs/v2.0/prep/aigefs/atmos.20260302/gefs/ccpa.t06z.grid3.06h.f00.grib2 is not available
WARNING: /lfs/h2/emc/vpppg/noscrub/emc.vpppg/evs/v2.0/prep/aigefs/atmos.20260302/gefs/ccpa.t00z.grid3.06h.f00.grib2 is not available
WARNING: /lfs/h2/emc/vpppg/noscrub/emc.vpppg/evs/v2.0/prep/aigefs/atmos.20260301/gefs/ccpa.t18z.grid3.06h.f00.grib2 is not available

The prep data looks to match the data in /lfs/h2/emc/vpppg/noscrub/Lichuan.Chen/evs/v2.0/prep/aigefs/atmos.20260302. The sizes of the prep subdirectories match as well:

du /lfs/h2/emc/vpppg/noscrub/andrew.benjamin/evs/v2.0/prep/aigefs/atmos.20260302
28G     ./aigefs
436K    ./completed
23G     ./gefs
51G     ./hgefs
101G    .
du /lfs/h2/emc/vpppg/noscrub/Lichuan.Chen/evs/v2.0/prep/aigefs/atmos.20260302
28G     ./aigefs
436K    ./completed
23G     ./gefs
51G     ./hgefs
101G    .

Can you confirm is your parallel had similar CCPA file warnings and confirm my prep output matches your own?

Logs: /lfs/h2/emc/vpppg/noscrub/andrew.benjamin/pr915/EVS/dev/drivers/scripts/prep/aigefs/jevs_prep_aigefs_atmos.o98787317
Data: /lfs/h2/emc/stmp/andrew.benjamin/evs/prod/tmp/jevs_prep_aigefs_atmos.98787317.dbqs01
Prep: /lfs/h2/emc/vpppg/noscrub/andrew.benjamin/evs/v2.0/prep/aigefs/atmos.20260302

CC: @malloryprow @AliciaBentley-NOAA

@AndrewBenjamin-NOAA
Copy link
Contributor

@GwenChen-NOAA

Regarding the CCPA warning: if you look at the log file, the ccpa file is looking in /lfs/h2/emc/vpppg/noscrub/emc.vpppg/evs/v2.0/prep/aigefs/atmos.20260302/gefs/. I think this may be because your instructions have COMIN set to the emc.vpppg parallel. It is looking for ccpa grib2 files in an evs directory that doesn't exist. Is there another directory that I should add to the instructions to make sure this data is found?

@GwenChen-NOAA
Copy link
Contributor Author

@AndrewBenjamin-NOAA, the CCPA warning is normal because 24-hr precip accumulation needs CCPA files from previous day to calculate, and it couldn't find them under /lfs/h2/emc/vpppg/noscrub/andrew.benjamin/evs/v2.0/prep/aigefs. If you want to make sure, you can copy /lfs/h2/emc/vpppg/noscrub/Lichuan.Chen/evs/v2.0/prep/aigefs/atmos.20260301 to your prep directory and rerun the prep job for 20260302 (remember to delete the old atmos.20260302/ first). Or, wait until tomorrow to test run it again. The CCPA warning should disappear then.

@AndrewBenjamin-NOAA
Copy link
Contributor

@AndrewBenjamin-NOAA, the CCPA warning is normal because 24-hr precip accumulation needs CCPA files from previous day to calculate, and it couldn't find them under /lfs/h2/emc/vpppg/noscrub/andrew.benjamin/evs/v2.0/prep/aigefs. If you want to make sure, you can copy /lfs/h2/emc/vpppg/noscrub/Lichuan.Chen/evs/v2.0/prep/aigefs/atmos.20260301 to your prep directory and rerun the prep job for 20260302 (remember to delete the old atmos.20260302/ first). Or, wait until tomorrow to test run it again. The CCPA warning should disappear then.

@GwenChen-NOAA Should the ccpa files be in their own directory in atmos.$PDY rather than in the gefs directory?

Also, for retesting, given the production switch, can you please provide testing instructions where we could potentially get a clean run without warnings on cactus?

CC: @malloryprow @AliciaBentley-NOAA

@GwenChen-NOAA
Copy link
Contributor Author

@GwenChen-NOAA Should the ccpa files be in their own directory in atmos.$PDY rather than in the gefs directory?

Also, for retesting, given the production switch, can you please provide testing instructions where we could potentially get a clean run without warnings on cactus?

CC: @malloryprow @AliciaBentley-NOAA

All obs files (ccpa, gfs anl, gdas etc.) are located in the gefs directory. This is the same structure used in global_ens.

For testing, please copy /lfs/h2/emc/vpppg/noscrub/Lichuan.Chen/evs/v2.0/prep/aigefs/atmos.20260302 to your prep directory, then run the prep job. Your prep job will create a new atmos.20260303 directory, which should be the same as mine at /lfs/h2/emc/vpppg/noscrub/Lichuan.Chen/evs/v2.0/prep/aigefs.

@malloryprow
Copy link
Contributor

@GwenChen-NOAA The prep job has finished. The job completed successfully with errors, there were a few warnings of missing data from the emc.vpppg prep space, tied to missing ccpa data:

WARNING: /lfs/h2/emc/vpppg/noscrub/emc.vpppg/evs/v2.0/prep/aigefs/atmos.20260302/gefs/ccpa.t12z.grid3.06h.f00.grib2 is not available
WARNING: /lfs/h2/emc/vpppg/noscrub/emc.vpppg/evs/v2.0/prep/aigefs/atmos.20260302/gefs/ccpa.t06z.grid3.06h.f00.grib2 is not available
WARNING: /lfs/h2/emc/vpppg/noscrub/emc.vpppg/evs/v2.0/prep/aigefs/atmos.20260302/gefs/ccpa.t00z.grid3.06h.f00.grib2 is not available
WARNING: /lfs/h2/emc/vpppg/noscrub/emc.vpppg/evs/v2.0/prep/aigefs/atmos.20260301/gefs/ccpa.t18z.grid3.06h.f00.grib2 is not available

The prep data looks to match the data in /lfs/h2/emc/vpppg/noscrub/Lichuan.Chen/evs/v2.0/prep/aigefs/atmos.20260302. The sizes of the prep subdirectories match as well:

du /lfs/h2/emc/vpppg/noscrub/andrew.benjamin/evs/v2.0/prep/aigefs/atmos.20260302
28G     ./aigefs
436K    ./completed
23G     ./gefs
51G     ./hgefs
101G    .
du /lfs/h2/emc/vpppg/noscrub/Lichuan.Chen/evs/v2.0/prep/aigefs/atmos.20260302
28G     ./aigefs
436K    ./completed
23G     ./gefs
51G     ./hgefs
101G    .

Can you confirm is your parallel had similar CCPA file warnings and confirm my prep output matches your own?

Logs: /lfs/h2/emc/vpppg/noscrub/andrew.benjamin/pr915/EVS/dev/drivers/scripts/prep/aigefs/jevs_prep_aigefs_atmos.o98787317 Data: /lfs/h2/emc/stmp/andrew.benjamin/evs/prod/tmp/jevs_prep_aigefs_atmos.98787317.dbqs01 Prep: /lfs/h2/emc/vpppg/noscrub/andrew.benjamin/evs/v2.0/prep/aigefs/atmos.20260302

CC: @malloryprow @AliciaBentley-NOAA

@GwenChen-NOAA, are these gefs files under the aigefs prep unique in some way from those that are found in global_ens prep, or are they the same?

@GwenChen-NOAA
Copy link
Contributor Author

@GwenChen-NOAA, are these gefs files under the aigefs prep unique in some way from those that are found in global_ens prep, or are they the same?

@malloryprow, they are not the same. They contain different variables and levels.

@AndrewBenjamin-NOAA
Copy link
Contributor

@GwenChen-NOAA, are these gefs files under the aigefs prep unique in some way from those that are found in global_ens prep, or are they the same?

@malloryprow, they are not the same. They contain different variables and levels.

@GwenChen-NOAA this is only partially true. The gefs prep data in global_ens has 70 grib records, the gefs prep in aigefs contains a subset of that file with only 50 records. But those files are otherwise identical. They are on the same grid, and the 50 records that match between the files contain the same data.

1. RH:10 mb
2. RH:50 mb
3. RH:100 mb
4. RH:200 mb
5. RH:250 mb
6. RH:500 mb
7. RH:700 mb
8. RH:850 mb
9. RH:925 mb
10. RH:1000 mb
11. RH: 2 m above ground
12. DPT: 2 m above ground
13. HGT:cloud ceiling
14. CAPE: surface
15. ICEC: surface
16. SNOD: surface
17. TMP: surface
18. VIS: surface
19. WEASD: surface
20. TCDC: entire atmosphere

CC: @malloryprow @AliciaBentley-NOAA

@GwenChen-NOAA
Copy link
Contributor Author

That's fair. You can say that the current gefs files under aigefs are a subset of those under global_ens since aigefs was adapted from global_ens and we needed it up and running in a short time. Jun Wang has asked for additional variables and levels that are not verified in global_ens. I will add those in the next development cycle. She also asked for additional metrics that are not used in the loss functions. We also consider reducing the grid resolution to better quantify AI models' blurriness/sharpness. So, aigefs and global_ens will grow more distinctly in the future to address their different needs for AI model verification and physics-based model verification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants