Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Replace SFA with SFAFast in REDCOMETS #2418

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

itsdivya1309
Copy link
Contributor

@itsdivya1309 itsdivya1309 commented Nov 30, 2024

Reference Issues/PRs

Closes: #1742

What does this implement/fix? Explain your changes.

Replace SFA with the faster version, SFAFast.

Does your contribution introduce a new dependency? If yes, which one?

None

Any other comments?

PR checklist

For all contributions
  • I've added myself to the list of contributors. Alternatively, you can use the @all-contributors bot to do this for you.
  • The PR title starts with either [ENH], [MNT], [DOC], [BUG], [REF], [DEP] or [GOV] indicating whether the PR topic is related to enhancement, maintenance, documentation, bugs, refactoring, deprecation or governance.

@aeon-actions-bot aeon-actions-bot bot added classification Classification package enhancement New feature, improvement request or other non-bug code enhancement labels Nov 30, 2024
@aeon-actions-bot
Copy link
Contributor

Thank you for contributing to aeon

I have added the following labels to this PR based on the title: [ $\color{#FEF1BE}{\textsf{enhancement}}$ ].
I have added the following labels to this PR based on the changes made: [ $\color{#BCAE15}{\textsf{classification}}$ ]. Feel free to change these if they do not properly represent the PR.

The Checks tab will show the status of our automated tests. You can click on individual test runs in the tab or "Details" in the panel below to see more information if there is a failure.

If our pre-commit code quality check fails, any trivial fixes will automatically be pushed to your PR unless it is a draft.

Don't hesitate to ask questions on the aeon Slack channel if you have any.

PR CI actions

These checkboxes will add labels to enable/disable CI functionality for this PR. This may not take effect immediately, and a new commit may be required to run the new configuration.

  • Run pre-commit checks for all files
  • Run mypy typecheck tests
  • Run all pytest tests and configurations
  • Run all notebook example tests
  • Run numba-disabled codecov tests
  • Stop automatic pre-commit fixes (always disabled for drafts)
  • Disable numba cache loading
  • Push an empty commit to re-run CI checks

Copy link
Member

@MatthewMiddlehurst MatthewMiddlehurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, could you do a small evaluation on how this impacts the words output and the overall accuracy/speed of the classifier? Some of the smaller UCR datasets should be fine.

May be of interest @zy18811. Don't need to do so if they are fine with the change, but wouldn't want to make a big change without either them knowing or evaluating.

@patrickzib
Copy link
Contributor

patrickzib commented Dec 4, 2024

Hi, very cool. I am a bit surprised that there are no changes required on the test cases for REDCOMETS.

@MatthewMiddlehurst is there a way to check, if the test_redcomets.py was re-evaluated?

@zy18811
Copy link
Contributor

zy18811 commented Dec 4, 2024

@itsdivya1309 thanks for doing this PR.

My (very brief) look suggests that the word outputs are identical between SFA and SFAFast in the case of the whole series transform that REDCOMETS uses so absolutely no difference in accuracy (hence the old test cases still pass).

It seems to give a small speedup most of the time, but is sometimes slower - not sure if this is additonal numba overhead or something else. It could do with a little bit more thorough of an analysis to confirm SFAFast is an improvement overall wrt. speed.

I also remember last time I looked at doing this there were some issues with particular alphabet sizes and word lengths on some datasets due to how SFA/SFAFast stores the words and numba having a max integer size. I'll have a look and see if I can replicate any of this - but I can see that @patrickzib has made a number of improvements to SFAFast since I last looked so might be fixed anyway.

@MatthewMiddlehurst
Copy link
Member

@zy18811 the class will still be limited in sizes due to the 64 bit limit for words required in the functions. The original does not have this.

@zy18811
Copy link
Contributor

zy18811 commented Jan 15, 2025

@MatthewMiddlehurst

I've done some quick analysis, 40 datasets 10 resamples, and it looks like SFA Fast has identical accuracy with a speedup.

sfa_fast_acc
sfa_fast_time

Regarding the numba errors discussed above, I think I ran into the issues when experimenting with different alphabet size ranges but the current range should have no problems I believe.

Overall, PR looks good to me - thanks @itsdivya1309.

Copy link
Contributor

@TonyBagnall TonyBagnall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks. Wonder if the same can be done with TDE?

@patrickzib
Copy link
Contributor

I've done some quick analysis, 40 datasets 10 resamples, and it looks like SFA Fast has identical accuracy with a speedup.

That is very cool!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
classification Classification package enhancement New feature, improvement request or other non-bug code enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] Replace SFA with SFAFast in REDCOMETS
5 participants