Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Evaluation to utils #311

Merged
merged 6 commits into from
Dec 12, 2024
Merged

[FEAT] Evaluation to utils #311

merged 6 commits into from
Dec 12, 2024

Conversation

elephaint
Copy link
Contributor

@elephaint elephaint commented Dec 2, 2024

PR:

  • Softly deprecates the HierarchicalEvaluation class in favor of a function evaluate which is a small wrapper around utilsforecast's evaluate which can work with utilsforecasts's evaluation functions. This further unifies the HF API across our packages.
  • The deprecation is soft in the sense that for now, we keep the HierarchicalEvaluation class and loss functions but remove it from the examples, the docs and I added a deprecation notice to each.
  • Updates all examples to show the new evaluation method.

Note:
utilsforecast scaled_crps is different from what is currently in hierarchicalforecast, as the former normalizes on a per-series basis, whereas the GluonTS implementation (which is followed in hierarchicalforecast currently) normalizes the CRPS using the norm based on all timeseries. Therefore, when recalculating our examples using the scaled crps of utilsforecast, we get different error metrics (and different conclusions!)

Todo / to solve:

  • Using utilsforecast, we evaluate slightly differently than we currently do with HF, in particular when using benchmark models in the evaluation metric. We previously would compute a relative score as follows: (overall_scalar_loss / overal_scalar_loss_benchmark), whereas using utilsforecast's loss functions, we compute the relative loss per timeseries, and compute the mean across these relative losses. Added a benchmark possibility in evaluate to obtain the correct behavior

The following loss functions should be added to utilsforecast to achieve parity with current HF functionality:

  • log_score: requires a set of inputs that's difficult to achieve in utilsforecast without some fiddling. Need to think about what this score adds beyond scaled_crps and energy_score Don't think it adds much tbh.
  • energy_score Deferred for now, the loss function can still be used via the old API, and it's removed from the examples, keeping only scaled_crps.
  • msse Already in utilsforecast
  • rel_mse not including this one in utilsforecast; users should use mse in conjunction with the benchmark-attribute in evaluate, to get the proper relative mse.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@elephaint elephaint marked this pull request as ready for review December 2, 2024 14:58
@elephaint elephaint changed the title [FEAT] Evalutation to utils [FEAT] Evaluation to utils Dec 2, 2024
@elephaint elephaint marked this pull request as draft December 2, 2024 19:37
@elephaint elephaint marked this pull request as ready for review December 11, 2024 14:33
@elephaint elephaint requested a review from jmoralez December 11, 2024 14:35
jmoralez
jmoralez previously approved these changes Dec 11, 2024
hierarchicalforecast/evaluation.py Outdated Show resolved Hide resolved
hierarchicalforecast/evaluation.py Outdated Show resolved Hide resolved
@elephaint elephaint merged commit 0cad0a1 into main Dec 12, 2024
17 checks passed
@elephaint elephaint deleted the feat/eval_to_utils branch December 12, 2024 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants