Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regridding functionalities (powered by xESMF) #243

Merged
merged 180 commits into from
Nov 23, 2023
Merged

Regridding functionalities (powered by xESMF) #243

merged 180 commits into from
Nov 23, 2023

Conversation

sol1105
Copy link
Contributor

@sol1105 sol1105 commented Aug 17, 2022

Pull Request Checklist:

  • What kind of change does this PR introduce?:

    • Extending Fix 224 inconsistent bounds #225 for all data_vars and coords of the xarray object (requires cf-xarray >= 0.7.5)
    • Adding jupyter notebook to showcase remapping functionalities
    • Adding regridding functionalities (powered by xESMF):
      • clisops.ops.regrid.regrid one line remapping function orchestrating below functions and classes
      • clisops.core.Grid class:
        • create xarray.Dataset holding description / coordinates of a regular lat-lon grid in CF compliant format:
          • from xarray.Dataset/DataArray (optionally of adaptive resolution, if source is not a regular lat-lon grid)
          • via grid_instructor (creating regional or global grid)
          • selecting a pre-defined grid (https://github.com/roocs/roocs-grids)
        • reformat grids (SCRIP, CF, xESMF formats)
        • detect extent, shape, format, type of the grid
        • detect collapsing or duplicated cells
        • create hash, compare grid objects
        • save to disk
        • exchange attributes and non-horizontal coordinates between datasets
        • calculate bounds (for regular lat-lon and curvilinear grids)
        • re-define data_vars and coords of xarray.Dataset for xESMF application
      • clisops.core.Weights:
        • orchestrate creation of weights with xESMF
        • holds xesmf.Regridder object
        • read from or store to local remapping weights cache (lock file mechanic for contemporary weights creation)
        • generate hash to identify similar weights
      • clisops.core.regrid:
        • application of remapping weights on xarray.Dataset/DataArray
        • optionally transfer attributes and non-horizontal coordinate variables from source to target dataset
        • set new attributes related to the remapping operation
    • Adding lock file mechanic (from source)
  • Does this PR introduce a breaking change?:

    • adding regrid operator in ops
    • adding regrid function, Grid and Weights classes to core
    • the remapping makes use of xesmf, which is already a dependency
    • adding dependency for roocs-grids (might be removed with resolving Move target grids location to configuration #168)
    • potentially TBA
  • Other information:

Future planned PR(s) specific for remapping:

  • Support for manually provided masks
  • Support for out-of-domain masking for nearest neighbour (likely better put to xesmf)
  • Deal with the periodic attribute of xesmf (xesmf resets periodic to False for conservative remapping, but probably should not)
  • Support datasets with shifted longitude frames (eg. ranging from (-60, 300) degrees_east) - base work already done in previous PR
  • Support reformatting from/to further formats
  • Support grid type detection for other formats
  • Support reading / reformatting / using weight files from other tools like nco, cdo, ...
  • Calculate nominal_resolution of the target grid, if not present
  • Set up central / web based remapping weights cache and synchronizing cron job
  • Find solution for vector variables / variables defined on cell edges
  • Support vertical interpolation (eg. with xgcm)
  • (Support other remapping backends than xesmf in the far future)

Future planned PR(s) generally for clisops:

  • Unify the detect_coordinate functions
  • Attribute a new tracking_id / PID (general requirement for clisops)
  • Support datasets with missing missing_value / _FillValue attribute that feature missing values (add fix in dachar?)

ellesmith88 and others added 30 commits March 4, 2021 10:39
- Fix typo importing cf_xarray
- Fix missing brackets in if statement in Grid.__repr__()
- Fix reformatting the bounds in Grid.grid_reformat() when reformatting xESMF to CF.
  Now using cfxr.vertices_to_bounds instead of _reravel
- Now updating the format attribute when executing Grid.grid_reformat()
- Added compute_bounds for format CF and grid_type regular_lat_lon
- Added basic adaptive_masking function that will be obsolete as soon as adaptive masking is implemented in xesmf.
- Added basic regrid function.
- Added basic Weights class. The method add_matrix_NaNs to mask out-of-source-domain grid cells in the remapping weights will likely be obsolete with future xesmf versions. Yet, Weights can only be generated by xesmf by specifying input and output grid as well as the remapping method.
- Minor changes to the Grid class:
  - Default attributes mostly set to None.
  - Now raising exception when none of the input parameters are specified (Dataset, grid_id or grid_instructor).
  - Added temporary fix for a small cf_xarray bug that identifies lat_bnds/lon_bnds as latitude and longitude coordinates if the bounds are specified as xarray.Dataset.coords and they have a units attribute.
  - detect_coords now uses roocs_utils.xarray_utils.get_coord_by_attr
- Cleared open questions regarding tests/core/test_regrid.py leading to minor changes.
… core/regrid.py

- Added the necessary methods to the Regrid class in ops/regrid.py
  Yet adaptive masking can result in an error when providing an xarray.Dataset as input.
  Also, the xesmf regridding seems to drop a few variables and keep others that are not necessary to be kept.

- Added Grid.__str__() method to core/regrid.py
- Added Grid.remove_halo() method to core/regrid.py. For partially duplicated rows/columns an exception will be raised.
  Fully duplicated rows/columns will be cut off.
- Added unmapped_to_nan parameter for init calls to xesmf.Regridder class, replaces Weights.add_matrix_NaNs
- Added parameters skip_na and na_thres for calls to xesmf.Regridder instances, replacing custom adaptive_masking function calls
- core/regrid.py - calls to roocs-utils.xarray-utils.get_coord_by_type adjusted to include ignore_aux_coords=False to allow 2D coordinate variables to be identified as coordinates
- Requirement for xesmf set to xesmf>=0.6.0
- test_grid_init_ds_tos_curvilinear adjusted for removed halo
- Regridding tests only run for xesmf>=0.6
@sol1105
Copy link
Contributor Author

sol1105 commented Nov 10, 2023

@Zeitsperre @cehbrecht I removed python 3.8 from the CI checks because of the cf_xarray problem you adressed in roocs_utils using the following lines in the requirements.txt:

cf-xarray>=0.3.1,<=0.8.4; python_version == '3.8'
cf-xarray>=0.3.1; python_version >= '3.9'

I am not sure how such a setting can be added in the dependencies entry of the pyproject.toml, in case python 3.8 is still required. Would it simply be:

dependencies = [
  "bottleneck>=1.3.1",
  # cf-xarray is differently named on conda-forge
  "cf-xarray>=0.8.6;python_version>='3.9'",
  "cf-xarray>=0.7.5,<=0.8.0;python_version=='3.8'",
...

@Zeitsperre
Copy link
Collaborator

@sol1105

I am not sure how such a setting can be added in the dependencies entry of the pyproject.toml, in case python 3.8 is still required. Would it simply be:

dependencies = [
  "bottleneck>=1.3.1",
  # cf-xarray is differently named on conda-forge
  "cf-xarray>=0.8.6; python_version>='3.9'",
  "cf-xarray>=0.7.5,<=0.8.0; python_version=='3.8'",
...

That's exactly what you would need to do to maintain Python3.8 support. Everything is using pip under-the-hood. The only thing I'm not certain about is that I believe cf_xarray up until v0.8.4 is compatible with Python3.8 (at least, that's what I indicated in other places).

@cehbrecht
Copy link
Collaborator

@Zeitsperre I leave the last words to you :) We would like to merge this PR and make the new clisops release with the regrid operator. I have prepared daops and rook already for the new operator and it works.

pyproject.toml Outdated Show resolved Hide resolved
@Zeitsperre
Copy link
Collaborator

Hey all,

Working on this now. I'm going to be adding a few changes to docstrings and updating some deprecated calls. I also might split the failing upstream build into its own workflow so that it isn't constantly failing. Will give everything a final overview after that's all done.

Should have something today.

Copy link
Collaborator

@Zeitsperre Zeitsperre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a ton going on here, but we've collectively been examining this PR for a very long time. Things look good beyond a handful of small issues I noticed when cleaning up the typing/docstrings. Congrats!

clisops/core/regrid.py Outdated Show resolved Hide resolved
clisops/core/regrid.py Outdated Show resolved Hide resolved
environment.yml Outdated Show resolved Hide resolved
docs/environment.yml Outdated Show resolved Hide resolved
Zeitsperre and others added 3 commits November 20, 2023 16:37
- Addressed review comments regarding environment.yml
- Added test: regridding of data array
- core.regrid: Fixed problems in case of DataArray instead of Dataset
@sol1105
Copy link
Contributor Author

sol1105 commented Nov 21, 2023

@Zeitsperre Thanks a lot for the review and your contribution. I addressed the outstanding issues you found

@Zeitsperre
Copy link
Collaborator

Well done. Feel free to merge whenever you're ready!

@sol1105 sol1105 merged commit a999bcb into master Nov 23, 2023
11 checks passed
@sol1105 sol1105 deleted the regrid-main branch November 23, 2023 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants