[ENH] Added R-Clustering clusterer to aeon #2382

Ramana-Raja · 2024-11-22T22:48:24Z

Reference Issues/PRs

#2132

What does this implement/fix? Explain your changes.

added R clustering model for aeon

Does your contribution introduce a new dependency? If yes, which one?

no

Any other comments?

PR checklist

For all contributions

I've added myself to the list of contributors. Alternatively, you can use the @all-contributors bot to do this for you.
The PR title starts with either [ENH], [MNT], [DOC], [BUG], [REF], [DEP] or [GOV] indicating whether the PR topic is related to enhancement, maintenance, documentation, bugs, refactoring, deprecation or governance.

For new estimators and functions

I've added the estimator to the online API documentation.
(OPTIONAL) I've added myself as a __maintainer__ at the top of relevant files and want to be contacted regarding its maintenance. Unmaintained files may be removed. This is for the full file, and you should not add yourself if you are just making minor changes or do not want to help maintain its contents.

For developers with write access

(OPTIONAL) I've updated aeon's CODEOWNERS to receive notifications about future changes to these files.

aeon-actions-bot · 2024-11-22T22:48:46Z

Thank you for contributing to `aeon`

I have added the following labels to this PR based on the title: [ $\color{#FEF1BE}{\textsf{enhancement}}$ ].
I have added the following labels to this PR based on the changes made: [ $\color{#4011F3}{\textsf{clustering}}$ ]. Feel free to change these if they do not properly represent the PR.

The Checks tab will show the status of our automated tests. You can click on individual test runs in the tab or "Details" in the panel below to see more information if there is a failure.

If our pre-commit code quality check fails, any trivial fixes will automatically be pushed to your PR unless it is a draft.

Don't hesitate to ask questions on the aeon Slack channel if you have any.

PR CI actions

These checkboxes will add labels to enable/disable CI functionality for this PR. This may not take effect immediately, and a new commit may be required to run the new configuration.

Run pre-commit checks for all files
Run mypy typecheck tests
Run all pytest tests and configurations
Run all notebook example tests
Run numba-disabled codecov tests
Stop automatic pre-commit fixes (always disabled for drafts)
Disable numba cache loading
Push an empty commit to re-run CI checks

TonyBagnall · 2024-11-27T08:52:54Z

hi, thanks for this but if we include this clusterer we want it to use our version of Rocket transformers which are optimised for numba

Ramana-Raja · 2024-11-27T08:58:44Z

hi, thanks for this but if we include this clusterer we want it to use our version of Rocket transformers which are optimised for numba

sure, I will try to reimplement it and use aeon Rocket transformers

…_branch # Conflicts: # aeon/clustering/_r_cluster.py

Ramana-Raja · 2025-03-02T15:44:45Z

@MatthewMiddlehurst I've resolved the PCA issue, and all test cases are now passing. I also added random_state in test case default param, similar to other clustering models in Aeon, to fix the test case error where estimator.labels_ was not matching estimator.fit(data). If you have some time, could you review the code and let me know if any improvements are needed?

… true

Ramana-Raja · 2025-04-04T16:49:39Z

@MatthewMiddlehurst I've resolved the PCA issue, and all test cases are now passing. I also added random_state in test case default param, similar to other clustering models in Aeon, to fix the test case error where estimator.labels_ was not matching estimator.fit(data). If you have some time, could you review the code and let me know if any improvements are needed?

Hi @MatthewMiddlehurst , just checking in, kindly following up on this PR when you have a moment.

MatthewMiddlehurst

Hi, this is a complex PR and the project is currently very busy so it is unlikely this will be in soon. I have left a few comments but I don't imagine they will be the last.

I see you linked some results above at some point, but This seems to be for just one of the estimators? Not sure if that is this or the original code. One of the things I am going to ask for before merging is a comparison for both this and the original so that is also something you can do.

MatthewMiddlehurst · 2025-04-04T18:54:08Z

aeon/clustering/feature_based/_r_cluster.py

+    def check_params(self, X):
+        """
+        Check and adjust parameters related to multiprocessing.
+
+        Parameters
+        ----------
+        X : np.ndarray
+            Input data.
+
+        Returns
+        -------
+        np.ndarray
+            Processed input data with float32 type.
+        """
+        X = X.astype(np.float32)
+        if self.n_jobs < 1 or self.n_jobs > multiprocessing.cpu_count():
+            n_jobs = multiprocessing.cpu_count()
+        else:
+            n_jobs = self.n_jobs
+        set_num_threads(n_jobs)
+        return X


I do not think this is required. Use the check_n_jobs utility. If you are setting numba threads, make sure to set it back to the original when done.

what changed? this looks the same.

aeon/clustering/feature_based/_r_cluster.py

MatthewMiddlehurst · 2025-04-04T18:55:37Z

aeon/clustering/feature_based/tests/test_r_cluster.py

+from aeon.clustering.feature_based._r_cluster import RClusterer
+from aeon.datasets import load_gunpoint
+
+X_ = [


If this is randomly generated data use the data generated utility in the individual test instead. If it is not what is the source?

aeon/clustering/feature_based/_r_cluster.py

Ramana-Raja · 2025-04-04T19:22:29Z

Hi, this is a complex PR and the project is currently very busy so it is unlikely this will be in soon. I have left a few comments but I don't imagine they will be the last.

I see you linked some results above at some point, but This seems to be for just one of the estimators? Not sure if that is this or the original code. One of the things I am going to ask for before merging is a comparison for both this and the original so that is also something you can do.

The results were comparing how accurate or similar the original implementation is to this one. And for some reason if I dont set random state the test case fail at "assert np.array_equal(estimator.labels_, estimator.predict(data))", I am not sure why. btw, thanks for taking the time to review this.

MatthewMiddlehurst · 2025-04-04T22:12:56Z

I see the image you posted, but it only has one column of ARI scores. I'm not sure what those scores are for, the clusters produced by your or produced by the original. We would want them for both so we can compare.

Test failure seems legit. _labels and predict output should be the same.

aeon/clustering/feature_based/_r_cluster.py

Ramana-Raja · 2025-04-05T09:45:04Z

I see the image you posted, but it only has one column of ARI scores. I'm not sure what those scores are for, the clusters produced by your or produced by the original. We would want them for both so we can compare.

Test failure seems legit. _labels and predict output should be the same.

The ARI scores are calculated between the output produced by this implementation and the original one(i.e between original model output and this model output). The test cases only fail when random_state is not provided,as you can see, the estimator was passed without a random_state

MatthewMiddlehurst · 2025-04-05T21:57:21Z

It should not fail with no random_state, or at least we should know why and that it is unsolvable.

So you took the cluster predictions from both and used those to calculate ARI? That is not how you evaluate these algorithms if so.

Ramana-Raja · 2025-04-06T07:35:19Z

It should not fail with no random_state, or at least we should know why and that it is unsolvable.

So you took the cluster predictions from both and used those to calculate ARI? That is not how you evaluate these algorithms if so.

without specifying a random state, the transformed data (from _get_transformed_data) will be different even when using the same input, which results in differences between the predicted and labels (as we are calling _get_transformed_data in both fit and predict). I thought ARI is typically used to assess the similarity between two clustering outputs, such as between the original model and our implementation. However, if you'd prefer that I evaluate the similarity of each model's output against the true Y values instead, I'm happy to do that.

MatthewMiddlehurst · 2025-04-11T16:03:38Z

without specifying a random state, the transformed data (from _get_transformed_data) will be different even when using the same input, which results in differences between the predicted and labels (as we are calling _get_transformed_data in both fit and predict).

Yes why is this happening.

I thought ARI is typically used to assess the similarity between two clustering outputs, such as between the original model and our implementation. However, if you'd prefer that I evaluate the similarity of each model's output against the true Y values instead, I'm happy to do that.

Yes please do that. I am more interested on performance against the labels. Your previous results do show that there are large differences between the clusterers for some datasets it looks like?

Ramana-Raja · 2025-04-11T17:24:44Z

without specifying a random state, the transformed data (from _get_transformed_data) will be different even when using the same input, which results in differences between the predicted and labels (as we are calling _get_transformed_data in both fit and predict).

Yes why is this happening.

_get_parameterised_data uses

quantiles = random_state.permutation(quantiles)

so without setting a random state, the quantiles might change in fit and predict. Similarly, _fit_biases

biases = _fit_biases(
            X,
            n_channels_per_combination,
            channel_indices,
            dilations,
            num_features_per_dilation,
            quantiles,
            self.indices,
            self.random_state,
        )

also depends on the random state,so without it, the parameters can end up different even for the same input

I thought ARI is typically used to assess the similarity between two clustering outputs, such as between the original model and our implementation. However, if you'd prefer that I evaluate the similarity of each model's output against the true Y values instead, I'm happy to do that.

Yes please do that. I am more interested on performance against the labels. Your previous results do show that there are large differences between the clusterers for some datasets it looks like?

Here is the result:

It also aligns with the original results: https://github.com/jorgemarcoes/R-Clustering/blob/main/results/benchmark_UCR_results.csv

MatthewMiddlehurst · 2025-04-11T20:35:01Z

is _get_parameterised_data generating the rocket kernels? If so why are we generating new kernels in predict?

Ramana-Raja · 2025-04-11T20:54:45Z

is _get_parameterised_data generating the rocket kernels? If so why are we generating new kernels in predict?

Since those kernels depend on the input data, I figured it made sense to generate them based on the given data for the predict . The original source code only had fit_predict, so that’s what led me to think this way. Do you think it’s reasonable to use the same parameters that is used during fitting for predict too?

MatthewMiddlehurst · 2025-04-11T21:05:48Z

No, a new kernel is a completely new feature essentially. The feature set you are creating in predict is completely different from the one you are generating in fit.

Ramana-Raja · 2025-04-11T21:12:06Z

No, a new kernel is a completely new feature essentially. The feature set you are creating in predict is completely different from the one you are generating in fit.

Should I use the same parameters created in fit for predict too?

Ramana-Raja · 2025-04-12T09:12:08Z

@MatthewMiddlehurst I have updated the predict function to utilize the parameters from fit, and it’s now passing all the test cases. Feel free to take a look when you get a chance.

create a r_clustering model

5092822

Ramana-Raja requested review from chrisholder and TonyBagnall as code owners November 22, 2024 22:48

aeon-actions-bot bot added clustering Clustering package enhancement New feature, improvement request or other non-bug code enhancement labels Nov 22, 2024

Automatic pre-commit fixes

af09db0

Ramana-Raja changed the title ~~[ENH] Added R-Clustering clusterer to aeon for issue #2132~~ [ENH] Added R-Clustering clusterer to aeon #2132 Nov 22, 2024

Ramana-Raja changed the title ~~[ENH] Added R-Clustering clusterer to aeon #2132~~ [ENH] Added R-Clustering clusterer to aeon Nov 22, 2024

Ramana-Raja and others added 14 commits November 23, 2024 14:24

Update _r_cluster.py

4f661a2

Update _r_cluster.py

4b9606e

Automatic pre-commit fixes

fd3d5c7

Update _r_cluster.py

47ff137

Automatic pre-commit fixes

6eb536f

Merge branch 'main' into r_cluster_branch

135e6bf

Merge branch 'main' into r_cluster_branch

907ede8

Merge branch 'main' into r_cluster_branch

92587de

create a r_clustering model

bf38de3

Automatic pre-commit fixes

8b21308

Update _r_cluster.py

54bf417

Automatic pre-commit fixes

966346a

Update _r_cluster.py

4692232

Automatic pre-commit fixes

ac28d60

Ramana-Raja and others added 5 commits November 27, 2024 15:08

used aeon mini rocket

d9db937

Merge remote-tracking branch 'origin/r_cluster_branch' into r_cluster…

472e501

…_branch # Conflicts: # aeon/clustering/_r_cluster.py

used aeon mini rocket

36a8e27

Automatic pre-commit fixes

72c658b

Update _r_cluster.py

afa22e6

Ramana-Raja added 2 commits March 2, 2025 20:43

added reproducibility of pca

72aeb41

added random_state to test param for fixing labels_ issue

b19a3be

Ramana-Raja and others added 2 commits March 3, 2025 13:20

added more test cases and updated fit_predict method to set fitted to…

75a9dce

… true

Automatic pre-commit fixes

dbb3b0f

Ramana-Raja requested a review from MatthewMiddlehurst March 17, 2025 17:30

Ramana-Raja added 3 commits March 31, 2025 15:40

added docs for each function

1de8519

added docs for each function

d02c118

added docs for each function

d1f7699

MatthewMiddlehurst requested changes Apr 4, 2025

View reviewed changes

made changes as requested by moderator

d1cb5f7

MatthewMiddlehurst reviewed Apr 4, 2025

View reviewed changes

aeon/clustering/feature_based/_r_cluster.py Outdated Show resolved Hide resolved

aeon/clustering/feature_based/_r_cluster.py Outdated Show resolved Hide resolved

aeon/clustering/feature_based/_r_cluster.py Outdated Show resolved Hide resolved

Ramana-Raja and others added 2 commits April 5, 2025 15:25

fixed changes requested by moderator

6385f06

Automatic pre-commit fixes

ed9c112

Ramana-Raja and others added 2 commits April 12, 2025 14:27

made predict use fit's parameter

db9872b

Automatic pre-commit fixes

c771138

fixed docs

514ea02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Added R-Clustering clusterer to aeon #2382

[ENH] Added R-Clustering clusterer to aeon #2382

Ramana-Raja commented Nov 22, 2024 •

edited

Loading

aeon-actions-bot bot commented Nov 22, 2024

TonyBagnall commented Nov 27, 2024

Ramana-Raja commented Nov 27, 2024

Ramana-Raja commented Mar 2, 2025

Ramana-Raja commented Apr 4, 2025

MatthewMiddlehurst left a comment

MatthewMiddlehurst Apr 4, 2025

MatthewMiddlehurst Apr 4, 2025

MatthewMiddlehurst Apr 4, 2025

Ramana-Raja commented Apr 4, 2025 •

edited

Loading

MatthewMiddlehurst commented Apr 4, 2025

Ramana-Raja commented Apr 5, 2025 •

edited

Loading

MatthewMiddlehurst commented Apr 5, 2025

Ramana-Raja commented Apr 6, 2025 •

edited

Loading

MatthewMiddlehurst commented Apr 11, 2025

Ramana-Raja commented Apr 11, 2025 •

edited

Loading

MatthewMiddlehurst commented Apr 11, 2025

Ramana-Raja commented Apr 11, 2025 •

edited

Loading

MatthewMiddlehurst commented Apr 11, 2025

Ramana-Raja commented Apr 11, 2025

Ramana-Raja commented Apr 12, 2025 •

edited

Loading

[ENH] Added R-Clustering clusterer to aeon #2382

Are you sure you want to change the base?

[ENH] Added R-Clustering clusterer to aeon #2382

Conversation

Ramana-Raja commented Nov 22, 2024 • edited Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Does your contribution introduce a new dependency? If yes, which one?

Any other comments?

PR checklist

For all contributions

For new estimators and functions

For developers with write access

aeon-actions-bot bot commented Nov 22, 2024

Thank you for contributing to aeon

PR CI actions

TonyBagnall commented Nov 27, 2024

Ramana-Raja commented Nov 27, 2024

Ramana-Raja commented Mar 2, 2025

Ramana-Raja commented Apr 4, 2025

MatthewMiddlehurst left a comment

Choose a reason for hiding this comment

MatthewMiddlehurst Apr 4, 2025

Choose a reason for hiding this comment

MatthewMiddlehurst Apr 4, 2025

Choose a reason for hiding this comment

MatthewMiddlehurst Apr 4, 2025

Choose a reason for hiding this comment

Ramana-Raja commented Apr 4, 2025 • edited Loading

MatthewMiddlehurst commented Apr 4, 2025

Ramana-Raja commented Apr 5, 2025 • edited Loading

MatthewMiddlehurst commented Apr 5, 2025

Ramana-Raja commented Apr 6, 2025 • edited Loading

MatthewMiddlehurst commented Apr 11, 2025

Ramana-Raja commented Apr 11, 2025 • edited Loading

MatthewMiddlehurst commented Apr 11, 2025

Ramana-Raja commented Apr 11, 2025 • edited Loading

MatthewMiddlehurst commented Apr 11, 2025

Ramana-Raja commented Apr 11, 2025

Ramana-Raja commented Apr 12, 2025 • edited Loading

Ramana-Raja commented Nov 22, 2024 •

edited

Loading

Thank you for contributing to `aeon`

Ramana-Raja commented Apr 4, 2025 •

edited

Loading

Ramana-Raja commented Apr 5, 2025 •

edited

Loading

Ramana-Raja commented Apr 6, 2025 •

edited

Loading

Ramana-Raja commented Apr 11, 2025 •

edited

Loading

Ramana-Raja commented Apr 11, 2025 •

edited

Loading

Ramana-Raja commented Apr 12, 2025 •

edited

Loading