Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More extract features for RDRS #182

Merged
merged 13 commits into from
Apr 6, 2023
2 changes: 2 additions & 0 deletions HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ New features and enhancements
* Allow passing ``GeoDataFrame`` instances in ``spatial_mean``'s ``region`` argument, not only geospatial file paths. (:pull:`174`).
* Allow searching for periods in `catalog.search`. (:issue:`123`, :pull:`170`).
* Allow searching and extracting multiple frequencies for a given variable. (:issue:`168`, :pull:`170`).
* New masking feature in ``extract_dataset``. (:issue:`180`, :pull:`182`).
* New method "sel" in ``xs.extract.clisops_subset``. (:issue:`180`, :pull:`182`).

Breaking changes
^^^^^^^^^^^^^^^^
Expand Down
41 changes: 39 additions & 2 deletions xscen/extract.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,9 @@ def clisops_subset(ds: xr.Dataset, region: dict) -> xr.Dataset:
-----
'region' fields:
method: str
['gridpoint', 'bbox', shape']
['gridpoint', 'bbox', shape','sel']
If the method is `sel`, this is not a call to clisops but only a subsetting with the xarray .sel() fonction.
The keys are the dimensions to subset and the values are turned into a slice.
<method>: dict
Arguments specific to the method used.
buffer: float, optional
Expand Down Expand Up @@ -131,6 +133,16 @@ def clisops_subset(ds: xr.Dataset, region: dict) -> xr.Dataset:
f" - clisops v{clisops.__version__}"
)

elif region["method"] in ["sel"]:
arg_sel = {
dim: slice(*map(float, bounds)) for dim, bounds in region["sel"].items()
}
ds_subset = ds.sel(**arg_sel)
new_history = (
f"[{datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}] "
f"{region['method']} subsetting with arguments {arg_sel}"
)

else:
raise ValueError("Subsetting type not recognized")

Expand All @@ -157,6 +169,7 @@ def extract_dataset(
xr_combine_kwargs: dict = None,
preprocess: Callable = None,
resample_methods: Optional[dict] = None,
mask: Union[bool, xr.Dataset] = False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also accept a DataArray ?

I don't find the "bool" option very useful, but I see that #183 would complete this feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure for DataArray.
I guess it is only useful for me now that I already have a mask that is in the catalogue.

) -> Union[dict, xr.Dataset]:
"""Take one element of the output of `search_data_catalogs` and returns a dataset, performing conversions and resampling as needed.

Expand Down Expand Up @@ -197,6 +210,12 @@ def extract_dataset(
If the method is not given for a variable, it is guessed from the variable name and frequency,
using the mapping in CVs/resampling_methods.json. If the variable is not found there,
"mean" is used by default.
mask: xr.Dataset, bool
A mask that is applied to all variables and only keeps data where it is True.
Where the mask is False, variable values are replaced by NaNs.
The mask should have the same dimensions as the variables extracted.
If `mask` is a dataset, the dataset should have a variable named 'mask'.
If `mask` is True, it will expect a `mask` variable at xrfreq `fx` to have been extracted.

Returns
-------
Expand All @@ -211,7 +230,7 @@ def extract_dataset(
name: str
Region name used to overwrite domain in the catalog.
method: str
['gridpoint', 'bbox', shape']
['gridpoint', 'bbox', shape', 'sel']
<method>: dict
Arguments specific to the method used.
buffer: float, optional
Expand Down Expand Up @@ -387,6 +406,24 @@ def extract_dataset(

out_dict[xrfreq] = ds

if mask:
if isinstance(mask, xr.Dataset):
ds_mask = mask["mask"]
elif (
"fx" in out_dict and "mask" in out_dict["fx"]
): # get mask that was extracted above
ds_mask = out_dict["fx"]["mask"].copy()
else:
raise ValueError(
"No mask found. Either pass a xr.Dataset to the `mask` argument or pass a `dc` that includes a dataset with a variable named `mask`."
)

# iter over all xrfreq to apply the mask
for xrfreq, ds in out_dict.items():
out_dict[xrfreq] = ds.where(ds_mask)
if xrfreq == "fx": # put back the mask
out_dict[xrfreq]["mask"] = ds_mask

return out_dict


Expand Down