-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERA5 catalog progress + Python issue #36
Comments
Thanks @meteorologist15! This helps to see how we can use the catalog builder to generate the modified csv, as discussed. |
TODO: open new issues for the dev and testing with catalog builder |
Catalog example generated with the Catalog Builder for ERA5 dataset (pressure levels, geopotential variable, 300 hPa:
The categories preserved are experiment_id, variable_id, and path The configuration used:
|
@meteorologist15 I’m trying to run this. Are you using the main branch from this repository? |
Locally committed small change to gfdlcrawler to account for filenames in without a "." in its name. Awaiting to further commit to branch on github. |
Two separate issues exist: 1) Filenames with multiple word variable names, separated by an underscore -- if the "" character in filenames is to be checked. 2) If using "" as a separator, properly capturing/resolving "monthly_averaged" in the filenames of monthly averaged datasets. Some more fundamental changes to the crawler script may be necessary. 3. Variable names in the path that differ from the filename. |
Great. Thanks. You may use this as reference. But also the fastest approach not the perfect approach is good for now. https://docs.google.com/document/d/17nlIgSQPwL1MFqwHlRV8R5vCpug08r71tM75poGpQtc/edit#heading=h.60aeh5dnv42m |
The manual catalog for ERA5 data, coupled with the JSON generated by the CatalogBuilder, can be ingested by intake-esm, but only partially. The unmodified catalog contains the following data:
The following is also run:
The following execution/error results:
After removing the offending datasets (in this case, the files containing t2m (2-meter temperature) and blh (boundary layer height)), I am able to successfully generate output from the "to_dataset_dict()" method. Example below:
Path to unmodified catalog (CSV): /nbhome/Kristopher.Rand/uda/catalogs/ERA5_initCatalog_slimmed.csv
Path to unmodified catalog's associated JSON: /nbhome/Kristopher.Rand/uda/catalogs/ERA5_initCatalog_slimmed.json
Path to modified catalog (CSV): /nbhome/Kristopher.Rand/uda/catalogs/ERA5_initCatalog_slimmed_modified.csv
Path to modified catalog's associated JSON: /nbhome/Kristopher.Rand/uda/catalogs/ERA5_initCatalog_slimmed_modified.json
The text was updated successfully, but these errors were encountered: