-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix is_time to avoid memory overload #397
base: master
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
Pull Request Test Coverage Report for Build 13785665732Details
💛 - Coveralls |
clisops/utils/dataset_utils.py
Outdated
def is_time(coord): | ||
""" | ||
Determines if a coordinate is time. | ||
|
||
:param coord: coordinate of xarray dataset e.g. coord = ds.coords[coord_id] | ||
:return: (bool) True if the coordinate is time. | ||
""" | ||
if coord.ndim >= 2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it really be False
by default for time_bnds
, so should time_bnds
not be considered as time
coordinate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know ... but if I skip the check/filter than I have to deal with it at other places ... see L106.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In l64 in function get_coord_by_type
one could replace
coord_vars = list(ds.coords) + list(ds.data_vars)
with coord_vars = (list(ds.coords) + list(ds.data_vars)).remove(get_main_variable(ds))
, then is_time
should no longer be run for variables that do not fit in memory, as it was before.
clisops/utils/dataset_utils.py
Outdated
return coord_id, [x for x in coords if x != coord_id] | ||
else: | ||
return coord_id | ||
if coord_id in ds.coords: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sure coord_id is in ds.coords (lat_bnds is not)
I have added this. there was another issue that |
@sol1105 tests are working now. We have updated to the latests Not sure how we solve this in future. We could patch |
@cehbrecht I asked my colleague @aulemahal to look into the issue on the For reference: pydata/xarray#8821 |
@Zeitsperre @sol1105 good to go? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Small changes.
for coord_id in coords: | ||
if coord_id in ds.coords: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could probably flatten this with:
if coord_id in coords and coord_id in ds.coords:
Check if a coordinate uses cftime datetime objects. | ||
Handles Dask-backed arrays for lazy evaluation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if a coordinate uses cftime datetime objects. | |
Handles Dask-backed arrays for lazy evaluation. | |
Check if a coordinate uses cftime datetime objects. | |
Handles Dask-backed arrays for lazy evaluation. |
@pytest.mark.skipif(xe is None, reason=XESMF_IMPORT_MSG) | ||
def test_regrid_basic(tmpdir, tmp_path, mini_esgf_data, xfail_if_xarray_incompatible): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the xfail_if_xarray_incompatible
fixture obsolete now? If so, we should remove it from tests/conf.py
@cehbrecht Don't forget to update the |
Pull Request Checklist:
What kind of change does this PR introduce?:
This is a fix for
is_time
function to avoid memory overload when usingcoord.values
for a data variable.Does this PR introduce a breaking change?:
Other information:
This issue occurred when testing atlas-v2 data with very large datasets (several GBs):
https://github.com/roocs/rook/blob/dev-atlas-v2/notebooks/atlas-v2.ipynb