Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-dimensional thresholds #1236

Merged
merged 12 commits into from
Dec 21, 2022
Merged

Multi-dimensional thresholds #1236

merged 12 commits into from
Dec 21, 2022

Conversation

aulemahal
Copy link
Collaborator

@aulemahal aulemahal commented Nov 10, 2022

Pull Request Checklist:

  • This PR addresses an already opened issue (for bug fixes / features)
  • Tests for the changes have been added (for bug fixes / features)
    • (If applicable) Documentation has been added / updated (for bug fixes / features)
  • HISTORY.rst has been updated (with summary of main changes)
    • Link to issue (:issue:number) and pull request (:pull:number) has been added

What kind of change does this PR introduce?

Thresholds and other quantities passed as parameters of indicators can now be multi-dimensional DataArrays. xArray broadcasting mechanisms will apply. Those parameters are now annotated as "Quantity" in the signatures (xclim.core.utils.Quantity), instead of "str" as before. Attributes where such thresholds where included will now read "<an array>" (french: "<une matrice>") for these new cases.

Does this PR introduce a breaking change?

No, I don't think so!

As expected, it was quite easy to implement this! Only a few indice needed modifications, and only a few are not compatible :
fire_weather_indices (because of the use of map_blocks) and [snow|rain]fall_approximation (but only with the "brown" method). Both cases shouldn't be where this feature is used the most.

Other information:

I did not add any checks about the dimensions of these thresholds. I thought we didn't need to be stricter than xarray. And that xclim users should expect xarray behaviour.

For example: if tas is ('lon', 'lat', 'time') and thresh is ('lon',) then it gets broadcasted along the lat and time dimensions automatically. But if the two lon coordinates differ, then a combined (union) lon coordinate is found in the output. And I made the choice that it's not our fault, nor is it for us to detect.

@JeremyFyke if you have time to test this branch for your heat wave need, that would be wonderful. Writing tests for this will be harder than implementing it, simply because of the large number of indicators involved...

@github-actions github-actions bot added the indicators Climate indices and indicators label Nov 10, 2022
@bzah
Copy link
Contributor

bzah commented Dec 18, 2022

Removing PercentileDataArray would break a few things in icclim but nothing too serious I believe.
ping @pagecp

@aulemahal
Copy link
Collaborator Author

Indeed, sorry I should have tagged you guys when I made the change.

On our end, there were two motivations:

  • Having a subclass of xr.DataArray breaks some xarray methods that either strictly expect DataArray instances or return DataArray instances no matter the input's class. Maybe @tlogan2000 remembers the bug better. I guess that this is a bug upstream but
  • the class seemed not to bring much functionality, especially with the new "Quantity" annotation. To me the combo xc.core.utils.is_percentile_dataarray and xc.core.formatting.get_percentile_metadata seemed like a smaller impact version of the same idea.

But if this breaks to many things in icclim, we can figure out a smoother way to fix the upstream bug while retaining the use you make of it.

@github-actions github-actions bot added the docs Improvements to documenation label Dec 19, 2022
@aulemahal aulemahal marked this pull request as ready for review December 19, 2022 20:23
Copy link
Collaborator

@Zeitsperre Zeitsperre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat addition!

@github-actions github-actions bot added the approved Approved for additional tests label Dec 19, 2022
Copy link
Contributor

@bzah bzah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat indeed!
I would only suggest to use another name for Quantity as it's shadowing pint's Quantity and could be confusing (especially since it is already confusing between pint.Quantity and units.Quantity)
Maybe "Threshold" would make sense or "QuantifiedThreshold" ?

@aulemahal
Copy link
Collaborator Author

aulemahal commented Dec 20, 2022

Indeed you are right on this.
I'm not sure about "Threshold" because it has become more than that here (ex: wb_cal in water_budget ). So this type is for anything that isn't a "timeseries" variable.

  • QuantityType ? (we aren't using the "type" suffix anywhere else though)
  • Parameter ? (now confusing with the Indicator class itself where all arguments are called "parameters", but that's hidden for most users. It also doesn't reflect the need for units)
    • ParamVar?
  • Quantified ?

@tlogan2000
Copy link
Collaborator

tlogan2000 commented Dec 20, 2022

I haven't dived into the code yet but basically the 'threshold' in all cases will now be an array right? ... How about ArrayValue or something? or ArrayParam ?

@aulemahal
Copy link
Collaborator Author

That's the other thing : the goal is to allow both arrays and scalar values. So the object can be either one of

  • a string (value + units) (the most common case)
  • a pint.Quantity (scalar with units)
  • a DataArray with units (the novelty of this PR)

In most cases, it wouldn't be an array.

@tlogan2000
Copy link
Collaborator

I think the small bug with the PercentileDataArray was when I tried to plot an annual cycle directly ... something like doy as x-axis with percentiles values over the year in different curves .. e.g. doy_perc.plot.line(x='dayofyear') . I don't think it is critical to change as can be worked around fairly easily and is not something that is done too often.

@JeremyFyke
Copy link
Contributor

Sorry for late arrival here - thanks @aulemahal for this work! I will definitely try this out during my next foray into the heat wave work. In general, this capability is really relevant to on-the-ground work, even outside of heatwaves.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@bzah
Copy link
Contributor

bzah commented Dec 20, 2022

I like "Quantified" best.

As for PercentileDataArray, it would make sense to simply migrate it into icclim code base as it is used in icclim own Threshold class.

@github-actions github-actions bot added the CI Automation and Contiunous Integration label Dec 20, 2022
@aulemahal aulemahal merged commit 6d43ef1 into master Dec 21, 2022
@aulemahal aulemahal deleted the non-scalar-thresholds branch December 21, 2022 19:50
@bzah bzah mentioned this pull request Jan 16, 2023
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Approved for additional tests CI Automation and Contiunous Integration docs Improvements to documenation indicators Climate indices and indicators
Projects
None yet
Development

Successfully merging this pull request may close these issues.

per-gridcell heat wave indice calculations?
5 participants