[DOC] tidy classification and shapelet notebooks (#2381)

* tidy shapelet notebook * nb * nb * nb * deep * dictionary * distance * feature * interval * IndividualInceptionClassifier
aeon-toolkit · Nov 25, 2024 · 9651e4a · 9651e4a
1 parent 7d9f145
commit 9651e4a
Show file tree

Hide file tree

Showing 8 changed files with 549 additions and 346 deletions.
diff --git a/examples/classification/classification.ipynb b/examples/classification/classification.ipynb
@@ -205,11 +205,8 @@
     "\n",
     "To apply `sklearn` classifiers directly, the data needs to be reshaped into a 2D\n",
     "numpy array. We also offer the ability to load univariate TSC problems directly in 2D\n",
-    " arrays. Please note that currently,\n",
-    " some Transformers treat a single multivariate time series in a numpy array as shape\n",
-    " `(n_timepoints, n_channels)` rather than `(n_channels,n_timepoints)`\n",
-    "  do not work correctly  with 2D numpy classification problems, so we recommend using\n",
-    "   3D numpyof shape `(n_channels, 1, n_timepoints)` for univariate series."
+    " arrays although we recommend using 3D numpy of shape `(n_channels, 1, n_timepoints)\n",
+    " ` for univariate collections."
    ]
   },
   {
@@ -449,7 +446,7 @@
    },
    "source": [
     "An alternative for MTSC is to build a univariate classifier on each channel, then\n",
-    "ensemble. Channel ensembling can be easily done via ``ChannelEnsembleClassifier``\n",
+    "ensemble. Channel ensembling can be easily done via ``ClassifierChannelEnsemble``\n",
     "which fits classifiers independently to specified channels, then\n",
     "combines predictions through a voting scheme. The example below builds a DrCIF\n",
     "classifier on the first channel and a RocketClassifier on the fourth and fifth\n",

diff --git a/examples/classification/convolution_based.ipynb b/examples/classification/convolution_based.ipynb
@@ -9,20 +9,23 @@
     }
    },
    "source": [
+    "\n",
     "# Convolution based time series classification in aeon\n",
     "\n",
     "This notebook is a high level introduction to using and configuring convolution based\n",
     "classifiers in aeon. Convolution based classifiers are based on the ROCKET transform\n",
-    "[1] and the subsequent extensions MiniROCKET [2], MultiROCKET [3], HYDRA [4] \n",
-    "a combination of ROCKET and HYDRA. These\n",
+    "[1] and the subsequent extensions MiniROCKET [2], MultiROCKET [3] and HYDRA [4]. These\n",
     "transforms can be used in pipelines, but we provide convolution based classifiers\n",
     " based on ROCKET variants for ease of use and reproducibility. These are the \n",
-    " RocketClassifier, MiniRocketClassifier, MultiRocketClassifier, MultiRocketHydraClassifier [4] and an ensemble of \n",
-    "ROCKET classifiers called the Arsenal. The RocketClassifier \n",
+    " RocketClassifier, MiniRocketClassifier, MultiRocketClassifier,\n",
+    " MultiRocketHydraClassifier (a combination of MultiRocket and Hydra described in [4])\n",
+    "  and the Arsenal, an ensemble of\n",
+    "ROCKET classifiers. The RocketClassifier\n",
     " combines the ROCKET transform with a scikit-learn RidgeClassifierCV classifier, while \n",
     " Hydra and MultiRocketHydra further extend this approach by enhancing feature \n",
-    " extraction and handling multiple feature maps more effectively.The term\n",
-    " convolution and kernel are  used interchangably in this notebook. A convolution is a\n",
+    " extraction and handling multiple feature maps more effectively. They can be\n",
+    " configured to use other scikit-learn classifiers. The term\n",
+    " convolution and kernel are  used interchangeably in this notebook. A convolution is a\n",
     "  subseries that is used to create features for a time series. To do this, a\n",
     "  convolution is run along a series, and the dot product is calculated. This creates a\n",
     "   new series (often called an activation map or feature map) where large values\n",
@@ -39,7 +42,7 @@
     "\n",
     "A large number of random convolutions are generated, and the two features are\n",
     "combined to produce a transformed train data set. This is used to train a linear\n",
-    "classifier. [1] reccomend a RIDGE Regression Classifier using cross-validation to\n",
+    "classifier. [1] recommend a RIDGE Regression Classifier using cross-validation to\n",
     "train the $L_2$-regularisation parameter $\\alpha$. A logistic regression classifier\n",
     "is suggested as a replacement for larger datasets.\n",
     "<img src=\"./img/rocket2.png\" width=\"600\" alt=\"ROCKET.\">\n",
@@ -215,8 +218,9 @@
     }
    },
    "source": [
-    "The ``MiniRocketClassifier`` [2] uses the ``MiniRocket`` transform a fast version of \n",
-    "``Rocket`` that uses hard coded convolutions and only uses PPV. MultiROCKET [3] \n",
+    "The ``MiniRocketClassifier`` [2] uses the ``MiniRocket`` transform, a fast version of\n",
+    "``Rocket`` that uses hard coded convolutions and only uses PPV.\n",
+    "MultiROCKET [3]\n",
     "adds three new pooling operations extracted from each\n",
     " kernel: mean of positive values (MPV), mean of indices of positive values (MIPV) and\n",
     "  longest stretch of positive values (LSPV). ``MultiRocket`` generates a total of 50k\n",

diff --git a/examples/classification/deep_learning.ipynb b/examples/classification/deep_learning.ipynb
@@ -10,8 +10,7 @@
     "The networks that are common to classification, regression and clustering are in the\n",
     "`networks` module. Our deep learning classifiers are based those used in deep\n",
     "learning bake off [1] and recent experimentation [2]. [3] provides an extensive recent\n",
-    "review of related deep learning work.\n",
-    "review of related deep learning work."
+    "review of related deep learning work.\n"
    ]
   },
   {
@@ -52,7 +51,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "\u001b[1m65/65\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 2ms/step\n"
+      "\u001B[1m65/65\u001B[0m \u001B[32m━━━━━━━━━━━━━━━━━━━━\u001B[0m\u001B[37m\u001B[0m \u001B[1m0s\u001B[0m 2ms/step\n"
      ]
     },
     {
@@ -97,15 +96,15 @@
     "\n",
     "<img src=\"./img/resnet.png\" width=\"600\" alt=\"ROCKET.\">\n",
     "\n",
-    "The InceptionTime deep learning algorithm Subsequent to [1],\n",
-    "InceptionTime is an ensemble of five SingleInceptionTime deep learning\n",
-    "classifiers. Each base classifier shares the same architecture based on\n",
-    "Inception modules. Diversity is achieved through randomly intialising weights.\n",
-    "A SingleInceptionTimeClassifier has the following structure.\n",
+    "The Inception Time deep learning algorithm was proposed subsequent to [1].\n",
+    "``InceptionTimeClassifier`` is an ensemble of five ``IndividualInceptionClassifier``\n",
+    "deep learning classifiers. Each base classifier shares the same architecture based on\n",
+    "Inception modules. Diversity is achieved through randomly initialising weights.\n",
+    "A ``IndividualInceptionClassifier`` has the following structure.\n",
     "\n",
     "<img src=\"./img/inception_module.png\" width=\"600\" alt=\"ROCKET.\">\n",
     "\n",
-    "A SingleInceptionTimeClassifier is structured as follows.\n",
+    "An ``IndividualInceptionClassifier`` is structured as follows.\n",
     "\n",
     "<img src=\"./img/inception.png\" width=\"600\" alt=\"ROCKET.\">"
    ]

diff --git a/examples/classification/dictionary_based.ipynb b/examples/classification/dictionary_based.ipynb
@@ -8,29 +8,37 @@
    "source": [
     "# Dictionary based time series classification in aeon\n",
     "\n",
-    "Dictionary based approaches adapt the bag of words model commonly used in signal processing, computer vision and audio processing for time series classification. Like shapelet based algorithms, dictionary approaches use phase-independent subsequences by sliding a window over time series. However, rather than to measure the distance to a subsequence, as in shapelets, each window is transformed into a word, and the frequency of occurrence of repeating patterns is recorded. Algorithms following the dictionary model build a classifier by:\n",
+    "Dictionary based approaches adapt the bag of words model commonly used in signal\n",
+    "processing, computer vision and audio processing for time series classification. Like\n",
+    " shapelet based algorithms, dictionary approaches use phase-independent subsequences\n",
+    " by sliding a window over time series. However, rather than measuring the distance\n",
+    " between a series and\n",
+    "  a subseries, as in shapelets, each window is transformed into a word, and the\n",
+    "  frequency of occurrence of repeating patterns is measured. Algorithms following the\n",
+    "  dictionary model build a classifier by:\n",
     "\n",
     "- Extracting subseries, aka windows, from a time series:\n",
     "- Transforming each window of real values into a discrete-valued word (a sequence of\n",
     "symbols over a fixed alphabet);\n",
-    "- Building a sparse feature vector of histograms of word counts; and\n",
-    "- Finally using a classification method from the machine learning repertoire on these feature vectors.\n",
-    "The figure illustrates these steps from a raw time series to a dictionary model using overlapping windows.\n",
+    "- Building a sparse feature vector of histograms of word counts; and finally\n",
+    "- Using a classification method from the machine learning repertoire on these feature\n",
+    " vectors.\n",
     "\n",
-    "<img src=\"./img/dictionary.png\" width=\"800\" alt=\"Dictionary based time series\n",
-    "classification\"> [<i>&#x200B;</i>](./img/dictionary.png)\n",
+    "The figure illustrates these steps from a raw time series to a dictionary model using overlapping windows.\n",
     "\n",
+    "<img src=\"./img/dictionary.png\" width=\"800\">\n",
     "Dictionary-based methods differ in the way they transform a window of real-valued\n",
     "measurements into discrete words (a process commonly called discretisation) in Step 2.\n",
     "Many methods are based on a representation called Symbolic Fourier Approximation\n",
     "(SFA). To create a discrete word from a window of continuous values in a series, SFA\n",
     "follows the following steps:\n",
+    "\n",
     "- Values in each window are normalized to have standard deviation of 1.\n",
     "- The dimensionality of each normalized window is reduced by the use of the truncated\n",
     "Fourier transform: the window subseries is transformed using as fast Fourier transform,\n",
     "and only the first few coefficients are retained.\n",
-    "- Each coefficient is discretised into a symbol from an alphabet a fixed size to\n",
-    "form a word.\n",
+    "- Each coefficient is discretised into a symbol from an alphabet a fixed size to form\n",
+    " a word.\n",
     "\n",
     "Creating words from windows requires three parameters:\n",
     "- 'window_size' specifies how long each window is;\n",
@@ -43,7 +51,7 @@
     "\n",
     "\n",
     "## Imports and Load Data\n",
-    "\n"
+    "``aeon`` currently has the following dictionary based classifiers:"
    ]
   },
   {
@@ -300,16 +308,15 @@
     "\n",
     "The `TemporalDictionaryEnsemble` (TDE) aggregates the best components of cBOSS and WEASEL with the concept of Spatial Pyramids used in computer vision, first used in this context in an algorithm called Spatial BOSS \\[6\\]. Spatial pyramids split the time series up in to contiguous segments and construct dictionary\n",
     "\n",
-    "<img src=\"./img/spatial_pyramids.png\" width=\"800\" alt=\"Spatial pyramids used in\n",
-    "TDE to capture location information.\">\n",
+    "<img src=\"./img/spatial_pyramids.png\" width=\"800\">\n",
     "\n",
     "At the top level of the pyramid, the whole series is used. cBOSS classifiers are\n",
     "built on the whole series that use bigrams and Information Gain Binning (IGB)\n",
     "proposed for WEASEL. At the next level, cBOSS classifiers are built independently on\n",
     "each half of the series. At the third level, quarters of the series are used. Once at\n",
     " the final level, all the histograms are concatenated.\n",
     "\n",
-    "The parameter space for the search for the TDE model is much larger, because of the extra parameters. Rather than random search of parameter combinations, after `randomly_selected_params` model evaluations, a Gaussian process regressor is used to select new parameter sets to evaluate for the ensemble, predicting the accuracy of a set of parameter values using past classifier performances. This improves overall performance. Like cBOSS, TDE is contractable, i.e. you can specify the approximate maximum train time with the `time_limit_in_minutes` argument.\n"
+    "The parameter space for the search for the TDE model is much larger, because of the extra parameters. Rather than random search of parameter combinations, after `randomly_selected_params` model evaluations, a Gaussian process regressor is used to select new parameter sets to evaluate for the ensemble, predicting the accuracy of a set of parameter values using past classifier performances. This improves overall performance. Like cBOSS, TDE is contractable, i.e. you can specify the approximate maximum train time with the `time_limit_in_minutes` argument."
    ]
   },
   {

diff --git a/examples/classification/distance_based.ipynb b/examples/classification/distance_based.ipynb
@@ -21,8 +21,7 @@
     " Distance functions have been mostly used with a nearest neighbour (NN) classifier,\n",
     " but you can use them with  [sklearn and aeon distances](../distances/sklearn_distances.ipynb)\n",
     "\n",
-    "<img src=\"./img/dtw2.png\" width=\"400\" alt=\"Example of warping two series to the best\n",
-    "alignment.\">\n",
+    "<img src=\"./img/dtw2.png\" width=\"400\">\n",
     "\n"
    ]
   },
@@ -234,7 +233,8 @@
     "The first algorithm to significantly out perform 1-NN DTW on the UCR data was the\n",
     "Elastic Ensemble (EE) [1]. EE is a weighted ensemble of 11 1-NN classifiers with a\n",
     "range of elastic distance measures. It was the best performing distance based\n",
-    "classifier in the bake off. Elastic distances can be slow, and EE requires cross\n",
+    "classifier in the original bake off [3]. Elastic distances can be slow, and EE requires\n",
+    "cross\n",
     "validation to find the weights of each classifier in the ensemble. You can configure\n",
     "EE to use specified distance functions, and tell it how much."
    ]
@@ -294,8 +294,8 @@
     "### Proximity Forest\n",
     "\n",
     "Proximity Forest [2] is a distance-based ensemble of decision trees.\n",
-    "It is the current state-of-the-art distance-based classifier that\n",
-    "creates an ensemble of decision trees, where the splits are based\n",
+    "It was the best performing algorithm in the 2024 bakeoff [4].\n",
+    "It creates an ensemble of decision trees, where the splits are based\n",
     "on the similarity between time series measured using various\n",
     "parameterised distance measures. The current algorithm is\n",
     "implemented to work for univariate, equal-length time-series data."
@@ -334,8 +334,7 @@
    },
    "source": [
     "## Performance on the UCR univariate datasets\n",
-    "You can find the dictionary based classifiers as follows. Note we do not have a\n",
-    "Proximity Forest implementation in aeon yet, but we do have the results"
+    "You can find the dictionary based classifiers as follows."
    ]
   },
   {
@@ -482,12 +481,19 @@
    "source": [
     "## References\n",
     "\n",
-    "[1] Lines J, Bagnall A (2015) Time series classification with ensembles of elastic\n",
-    "distance measures. Data Mining and Knowledge Discovery 29:565–592\n",
+    "[1] Lines J and Bagnall A (2015) Time series classification with ensembles of elastic\n",
+    "distance measures. Data Mining and Knowledge Discovery 29:565–592 https://link.springer.com/article/10.1007/s10618-014-0361-2\n",
     "\n",
     "[2] Lucas et al. (2019) Proximity Forest: an effective and scalable distance-based\n",
     "classifier. Data Mining and Knowledge Discovery 33: 607--635 https://arxiv.org/abs/1808.10594\n",
-    "\n"
+    "\n",
+    "[3] Bagnall et al. (2017) The great time series classification bake off: a review and\n",
+    "experimental\n",
+    "evaluation of recent algorithmic advances\n",
+    "Data mining and knowledge discovery 31: https://link.springer.com/article/10.1007/S10618-016-0483-9\n",
+    "\n",
+    "[4] Middlehurst et al. (2024) Bake off redux: a review and experimental evaluation of\n",
+    " recent time series classification algorithms. Data mining and knowledge discovery 38: https://link.springer.com/article/10.1007/s10618-024-01022-1"
    ]
   }
  ],

diff --git a/examples/classification/feature_based.ipynb b/examples/classification/feature_based.ipynb
@@ -103,8 +103,9 @@
     "  for catch22 build a decision tree classifier after applying the transform to each\n",
     "  data series.\n",
     "\n",
-    "The Catch22Classifier is simply a convenient wrapper for a pipelin of a Catch22\n",
-    "transformation and a sklearn random forest classifier by default."
+    "The ``Catch22Classifier`` is simply a convenient wrapper for a pipeline of a Catch22\n",
+    "transformation and a sklearn random forest classifier. It can be configured to use\n",
+    "any sklearn classifier in the constructor."
    ]
   },
   {
@@ -176,7 +177,7 @@
     "collection of just under 800 features extracted from time series. An extensive\n",
     "comparison of feature based pipelines [3] found that TSFresh followed by a rotation\n",
     "forest classifier [4] was significantly more accurate than other combinations. This\n",
-    "pipeline is hard coded into an aeon classifier called the FreshPRINCEClassifier."
+    "pipeline is hard coded into an aeon classifier called the ``FreshPRINCEClassifier``."
    ]
   },
   {
@@ -294,7 +295,7 @@
     "from aeon.datasets.tsc_datasets import univariate\n",
     "\n",
     "names = [t[0].replace(\"Classifier\", \"\") for t in est]\n",
-    "names.remove(\"Summary\")  # Need to evaluate this\n",
+    "names.remove(\"Summary\")\n",
     "results, present_names = get_estimator_results_as_array(\n",
     "    names, univariate, include_missing=False\n",
     ")\n",