Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR:
HierarchicalEvaluation
class in favor of a functionevaluate
which is a small wrapper around utilsforecast'sevaluate
which can work with utilsforecasts's evaluation functions. This further unifies the HF API across our packages.HierarchicalEvaluation
class and loss functions but remove it from the examples, the docs and I added a deprecation notice to each.Note:
utilsforecast
scaled_crps
is different from what is currently in hierarchicalforecast, as the former normalizes on a per-series basis, whereas the GluonTS implementation (which is followed in hierarchicalforecast currently) normalizes the CRPS using the norm based on all timeseries. Therefore, when recalculating our examples using the scaled crps of utilsforecast, we get different error metrics (and different conclusions!)Todo / to solve:
UsingAdded autilsforecast
, we evaluate slightly differently than we currently do with HF, in particular when using benchmark models in the evaluation metric. We previously would compute a relative score as follows:(overall_scalar_loss / overal_scalar_loss_benchmark)
, whereas using utilsforecast's loss functions, we compute the relative loss per timeseries, and compute the mean across these relative losses.benchmark
possibility inevaluate
to obtain the correct behaviorThe following loss functions should be added to utilsforecast to achieve parity with current HF functionality:
log_score: requires a set of inputs that's difficult to achieve in utilsforecast without some fiddling. Need to think about what this score adds beyondDon't think it adds much tbh.scaled_crps
andenergy_score
energy_scoreDeferred for now, the loss function can still be used via the old API, and it's removed from the examples, keeping onlyscaled_crps
.msseAlready in utilsforecastrel_msenot including this one in utilsforecast; users should usemse
in conjunction with thebenchmark
-attribute inevaluate
, to get the proper relative mse.