Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: The defined _FillValue and missing_value are not the same #308

Closed
cehbrecht opened this issue Nov 29, 2023 · 5 comments
Closed

Error: The defined _FillValue and missing_value are not the same #308

cehbrecht opened this issue Nov 29, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@cehbrecht
Copy link
Collaborator

  • clisops version: 0.12.0
  • Python version:
  • Operating System:

Description

I have run a subsetting operation on a CMIP6 dataset:

  • ds_id = c3s-cmip6.CMIP.NCAR.CESM2-WACCM.historical.r1i1p1f1.day.tas.gn.v20190227
  • time="2000/2000"
  • area="5,49,7,51"

CLISOPS failed with the following error message:

File "/usr/local/anaconda/envs/rook/lib/python3.11/site-packages/clisops/ops/subset.py", line 233, in subset
    return op.process()
           ^^^^^^^^^^^^
  File "/usr/local/anaconda/envs/rook/lib/python3.11/site-packages/clisops/ops/base_operation.py", line 146, in process
    processed_ds = self._remove_redundant_fill_values(processed_ds)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda/envs/rook/lib/python3.11/site-packages/clisops/ops/base_operation.py", line 94, in _remove_redundant_fill_values
    raise Exception(
Exception: The defined _FillValue and missing_value for 'tas' are not the same '1.0000000200408773e+20' != '1e+20'.

This error does not happen for all datasets. Our tests didn't cover this case. I suppose the error is caused by the newer xarray and/or cf_xarray version used in clisops v.12.0.

This error does not appear in clisops v0.10.1 used by rook v0.11.0.

What I Did

I have reproduced the error with the following notebook:

https://nbviewer.org/github/roocs/rooki/blob/master/notebooks/errors/cds-error-2023-11-28-subset-cmip6.ipynb

It is using rooki:

wf = ops.Subset(
        ops.Input(
            'tas', ['c3s-cmip6.CMIP.NCAR.CESM2-WACCM.historical.r1i1p1f1.day.tas.gn.v20190227']
        ),
        area="5,49,7,51",
        time="2000/2000"
)
resp = wf.orchestrate()
resp.ok

An example file for testing can be this one:
https://data.mips.copernicus-climate.eu/thredds/fileServer/esg_c3s-cmip6/CMIP/NCAR/CESM2-WACCM/historical/r1i1p1f1/day/tas/gn/v20190227/tas_day_CESM2-WACCM_historical_r1i1p1f1_gn_20000101-20091231.nc

@cehbrecht cehbrecht added the bug Something isn't working label Nov 29, 2023
@cehbrecht
Copy link
Collaborator Author

cehbrecht commented Nov 29, 2023

The error is caused by this check:

if fval != mval:
raise Exception(
f"The defined _FillValue and missing_value for '{var}' are not the same '{fval}' != '{mval}'."
)

This code was added with the regrid operator in release 0.12. So, it is not an issue with cf_xarray dependencies.

@cehbrecht
Copy link
Collaborator Author

@sol1105 I have added a quick-fix in this PR #309. It converts the exception into a warning message. What would be a meaningful solution?

@sol1105
Copy link
Contributor

sol1105 commented Nov 30, 2023

In #243 I updated your xarray _FillValue workaround (_remove_redundant_fill_values (issues #198 / #224 )). My changes cause this problem.

Since xarray adds NaN as _FillValue, when no _FillValue is defined:
if the source data has no missing_value/_FillValue attribute defined: set it to None so xarray would not set it to NaN
if both are defined, check if they are the same (it is a data quality check basically that I never thought would trigger for C3S data).

In the case of this dataset, _FillValue is defined as a float (as it should be), while missing_value is defined as double.
This is still a flaw in the source data, but none that should prevent processing.
One could use numpy.isclose in the check and set both missing_value/_FillValue attributes to the same data type as has the data.

@sol1105
Copy link
Contributor

sol1105 commented Nov 30, 2023

Update: When xarray opens such a dataset it prints a warning:

xarray/conventions.py:543: SerializationWarning: variable 'tas' has multiple fill values {1e+20, 1e+20}, decoding all values to NaN.

After looking into the xarray code, when writing out the data to disk...

  • ... xarray casts the _FillValue / missing_value attributes to the same data type as the data
  • ... xarray will fail encoding when the values of both attributes are "not close" (rtol=1e-5, atol=1e-8)

So I guess we can just set if _FillValue != missing_value: FillValue = missing_value with an issued warning and let xarray handle the dtype since it properly decodes the missing values.

@cehbrecht
Copy link
Collaborator Author

fixed by PR #309

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants