-
Notifications
You must be signed in to change notification settings - Fork 895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Imp/tsmixer basic #2555
base: master
Are you sure you want to change the base?
Imp/tsmixer basic #2555
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Hi @eschibli, First of all, thanks for opening this PR! For the linting, it will make your life much easier if you follow these instruction, or you can also run it manually |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2555 +/- ##
==========================================
- Coverage 94.15% 94.10% -0.05%
==========================================
Files 139 139
Lines 14992 15006 +14
==========================================
+ Hits 14116 14122 +6
- Misses 876 884 +8 ☔ View full report in Codecov by Sentry. |
Thanks @madtoinou. I was not able to get Gradle running on my machine and didn't realize ruff was that easy to set up so sorry for spamming your test pipeline. I don't believe the failing mac build is a result of my changes so it should be good for review now. |
Hi @eschibli, thanks for the PR. Yes, the failing mac tests are unrelated to your PR, we're working on it :). |
Understood Dennis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks great, thank you for this nice PR @eschibli! Some minor comment about the order of the operations/projections to make the flow more intuitive.
Could you also extend the TSMixer notebook to include a section where the difference in performance with "project_first_layer=True/False" and future covariates can be visualized?
x = _time_to_feature(x) | ||
|
||
# Otherwise, encoder-style model with residual blocks in input time dimension | ||
# In the original paper this was not implimented for future covariates, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# In the original paper this was not implimented for future covariates, | |
# In the original paper this was not implemented for future covariates, |
# In the original paper this was not implimented for future covariates, | ||
# but rather than ignoring them or raising an error we remap them to the input time dimension. | ||
# Suboptimal but may be useful in some cases. | ||
elif self.future_cov_dim: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to make it a bit more intuitive, I would move this code below, inside the if self.future_cov_dim
and change the condition to if not self.project_first_layer
in order to group the operation on each kind of features:
- "target"; project to output time dimension in the first layer if
project_first_layer = True
otherwise we stay in input time dimension - "target"; do the feature_mixing_hist (not changed)
- "fut_cov"; project the future covariates to input time dimension if
project_first_layer=False
(the logic you added) - concatenate the future covariates to the target features (not changed)
- static covariates (not changed)
- "target"; projection to the output time dimension if it did not occur earlier
- "target"; application of fc_out, critical for probabilistic forecasts
x = mixing_layer(x, x_static=x_static) | ||
|
||
# If we are in the input time dimension, we need to project to the output time dimension. | ||
# The original paper did not a fc_out layer (as hidden_size == output_dim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# The original paper did not a fc_out layer (as hidden_size == output_dim) | |
# The original paper did not use a fc_out layer (as hidden_size == output_dim) |
if project_first_layer: | ||
assert model.model.sequence_length == output_len | ||
else: | ||
assert model.model.sequence_length == input_len |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the test also include a call to predict()
to make sure it works as well (even if the forward pass is already occurring in the call to fit()
)?
Checklist before merging this PR:
Implements #2510
Summary
Adds the option to project to the output temporal space at the end of TS-Mixer, rather than the beginning. This was how most of the results in the original google-research paper were achieved (ie, the architecture in Fig #1 of the paper). This may allow higher performance in cases where past covariates are important by allowing a more direct series of residual connections along the input time dimension.
I allowed support for future covariates by instead projecting them into the lookback temporal space, but this probably won't perform well in cases where they are more important than the historical targets and past covariates.
Other Information
The original paper and source code do not clarify whether the final temporal projection should go before or after the final feature projection as they hardcoded
hidden_size
tooutput_dim
and therefore did not have need a final feature projection. I erred on the side of putting the temporal projection first, as otherwise the commonoutput_dim==1
could lead to unexpected, catastrophic compression before the temporal projection step.