diff --git a/docs/completed_projects.md b/docs/completed_projects.md new file mode 100644 index 0000000000..8ae121caf2 --- /dev/null +++ b/docs/completed_projects.md @@ -0,0 +1,40 @@ + +[//]: # (Try to put references in harvard style for consistency.) + +# aeon projects: completed + +This is a list of people who have completed projects with `aeon` or `aeon-neuro` +either as paid interns, undergraduate or postgraduate project or as volunteers +mentored by `aeon` core developers. + +## 2024 + +# Interns + +1. Divya ({user}`itsadivya`): Proximity forest 1 and 2. +divya was a google summer of code student + +Her blog is here +2. Aadya ({user}`MatthewMiddlehurst`): Deep clustering + +3. Gabriel ({user}`MatthewMiddlehurst`): EEG classification with aeon-neuro + +4. Danieli ({user}`MatthewMiddlehurst`): catch22 and + +5. Ivan ({user}`MatthewMiddlehurst`): shapelets + +# MSc projects + +1. X : Reimannian EEG +2. X: shapelet quality measures + +# BSc projects + +1. X +2. X +3. X + +# Volunteers + +1. Frank +2. Adam diff --git a/docs/mentoring.md b/docs/mentoring.md index 3a0bb55a8c..acd0661426 100644 --- a/docs/mentoring.md +++ b/docs/mentoring.md @@ -1,17 +1,17 @@ [//]: # (Try to put references in harvard style for consistency.) -# Mentoring and Projects +# aeon projects: ongoing or potential -`aeon` runs a range of short to medium duration projects interacting with the community -and the code +`aeon` runs a range of short to medium duration projects that involve +developing or using aeon and interacting with the community and the code base. These projects are designed for internships, usage as part of undergraduate/postgraduate projects at academic institutions, and as options for programs such as [Google Summer of Code (GSoC)](https://summerofcode.withgoogle.com/). For those interested in undertaking a project outside these scenarios, we recommend joining the [Slack](https://join.slack.com/t/aeon-toolkit/shared_invite/zt-22vwvut29-HDpCu~7VBUozyfL_8j3dLA) -and discussing with the project mentors. We aim to run schemes to +and discussing with the community. We aim to run schemes to help new contributors to become more familiar with `aeon`, time series machine learning research, and open-source software development. @@ -20,7 +20,7 @@ majority of them will require some knowledge of machine learning and time series ## Current aeon projects -This is a list of some of the projects we are interested in running in 2024. Feel +This is a list of some of the projects we are interested in running in 2024/25. Feel free to propose your own project ideas, but please discuss them with us first. We have an active community of researchers and students who work on `aeon`. Please get in touch via Slack if you are interested in any of these projects or have any questions. @@ -33,13 +33,13 @@ to open source. We list projects by time series task [Classification](#classification) 1. Optimizing the Shapelet Transform for classification and similarity search -2. EEG classification with aeon-neuro (Listed for GSoC 2024) -3. Improved Proximity Forest for classification (listed for GSoC 2024) +2. EEG classification with aeon-neuro +3. Implement TS-CHIEF 4. Improved HIVE-COTE implementation. 5. Compare distance based classification. [Forecasting](#forecasting) -1. Machine Learning for Time Series Forecasting (listed in GSoC 2024) +1. Machine Learning for Time Series Forecasting 2. Deep Learning for Time Series Forecasting 3. Implement ETS forecasters in aeon @@ -48,7 +48,7 @@ to open source. We list projects by time series task 2. Deep learning based clustering algorithms [Anomaly Detection](#anomaly-detection) -1. Anomaly detection with the Matrix Profile and MERLIN +1. Anomaly detection with the Matrix Profile, MERLIN and MADRID [Segmentation](#segmentation) 1. Time series segmentation @@ -58,7 +58,7 @@ to open source. We list projects by time series task 2. Implement channel selection algorithms [Visualisation](#visualisation) -1. Explainable AI with the shapelet transform (Southampton intern project). +1. Explainable AI with the shapelet transform [Regression](#regression) 1. Adapt forecasting regressors to time series extrinsic regression. @@ -147,7 +147,7 @@ transform: A new approach for time series shapelets. In International Conference Pattern Recognition and Artificial Intelligence (pp. 653-664). Cham: Springer International Publishing. -#### 2. EEG classification with aeon-neuro (Listed for GSoC 2024) +#### 2. EEG classification with aeon-neuro Mentors: Tony Bagnall ({user}`TonyBagnall`) and Aiden Rushbrooke @@ -193,73 +193,7 @@ Time Series Classification of Electroencephalography Data, IWANN 2023. 2. MNE Toolkit, https://mne.tools/stable/index.html 3. The Brain Imaging Data Structure (BIDS) standard, https://bids.neuroimaging.io/ -#### 3. Improved Proximity Forest for classification (listed for GSoC 2024) - -Mentors: Matthew Middlehurst ({user}`MatthewMiddlehurst`) and Tony Bagnall -({user}`TonyBagnall`) - -##### Related Issues -[#159](https://github.com/aeon-toolkit/aeon/issues/159) -[#428](https://github.com/aeon-toolkit/aeon/issues/428) - - -##### Description - -Distance-based classifiers such as k-Nearest Neighbours are popular approaches to time -series classification. They primarily use elastic distance measures such as Dynamic Time -Warping (DTW) to compare two series. The Proximity Forest algorithm [1] is a -distance-based classifier for time series. The classifier creates a forest of decision -trees, where the tree splits are based on the distance between time series using -various distance measures. A recent review of time series classification algorithms [2] -found that Proximity Forest was the most accurate distance-based algorithm of those -compared. - -`aeon` previously had an implementation of the Proximity Forest algorithm, but it was -not as accurate as the original implementation (the one used in the study) and was -unstable on benchmark datasets. The goal of this project is to significantly overhaul -the previous implementation or completely re-implement Proximity Forest in `aeon` to -match the accuracy of the original algorithm. This will involve comparing against the -authors' Java implementation of the algorithm as well as alternate Python versions. -The mentors will provide results for both for alternative methods. While knowing -Java is not a requirement for this project, it could be beneficial. - -Recently, the group which published the algorithm has proposed a new version of the -Proximity Forest algorithm, Proximity Forest 2.0 [3]. This algorithm is more accurate -than the original Proximity Forest algorithm, and does not currently have an -implementation in `aeon` or elsewhere in Python. If time allows, the project could also -involve implementing and evaluating the Proximity Forest 2.0 algorithm. - -##### Project stages - -1. Learn about `aeon` best practices, coding standards and testing policies. -2. Study the Proximity Forest algorithm and previous `aeon` implementation. -3. Improve/re-implement the Proximity Forest implementation in `aeon`, with -the aim being to have an implementation that is as accurate as the original algorithm, -while remaining feasible to run. -4. Evaluate the improved implementation against the original `aeon` Proximity Forest -and the authors' Java implementation. -5. If time, implement the Proximity Forest 2.0 algorithm and repeat the above -evaluation. - -##### Expected Outcomes - -We expect the mentee engage with the aeon community and produce a high quality -implementation of the Proximity Forest algorithm(s) that gets accepted into the toolkit. - -##### References - -1. Lucas, B., Shifaz, A., Pelletier, C., O’Neill, L., Zaidi, N., Goethals, -B., Petitjean, F. and Webb, G.I., 2019. Proximity forest: an effective and scalable -distance-based classifier for time series. Data Mining and Knowledge Discovery, 33(3), -pp.607-635. -2. Middlehurst, M., Schäfer, P. and Bagnall, A., 2023. Bake off redux: a review and -experimental evaluation of recent time series classification algorithms. arXiv preprint -arXiv:2304.13029. -3. Herrmann, M., Tan, C.W., Salehi, M. and Webb, G.I., 2023. Proximity Forest 2.0: A -new effective and scalable similarity-based classifier for time series. arXiv -preprint arXiv:2304.05800. - -#### 4. Improved HIVE-COTE implementation +#### 3. Improved HIVE-COTE implementation Mentors: Matthew Middlehurst ({user}`MatthewMiddlehurst`) and Tony Bagnall ({user}`TonyBagnall`) @@ -302,7 +236,7 @@ alternative structures. This can easily develop into a research project. experimental evaluation of recent time series classification algorithms. arXiv preprint arXiv:2304.13029. -#### 5. Compare distance based classification and regression +#### 4. Compare distance based classification and regression Mentors: Chris Holder ({user}`cholder`) and Tony Bagnall ({user}`TonyBagnall`) @@ -327,9 +261,9 @@ datasets. ### Forecasting -#### 1. Machine Learning for Time Series Forecasting (listed in GSoC 2024) +#### 1. Machine Learning for Time Series Forecasting -Mentors: Tony Bagnall ({user}`TonyBagnall`) and Matthew Middlehurst (@MatthewMiddlehurst). +Mentors: Tony Bagnall ({user}`TonyBagnall`) and Leo Tsaprounis ({user}`ltsaprounis`) . ##### Related Issues [#265](https://github.com/aeon-toolkit/aeon/issues/265) @@ -353,7 +287,7 @@ SETAR-Tree [3]. ##### Expected Outcomes -1. Contributions to the aeon forecasting module. +1. Contributions to the new experimental aeon forecasting module. 2. Implementation of a machine learning forecasting algorithms. 3. Help write up results for a technical report/academic paper (depending on outcomes). @@ -497,7 +431,7 @@ Published: 07 September 2020 Volume 34, pages 1936–1962, (2020) ### Anomaly detection -#### 1. Anomaly detection with the Matrix Profile and MERLIN +#### 1. Anomaly detection with the Matrix Profile, MERLIN and MADRID Mentors: Matthew Middlehurst ({user}`MatthewMiddlehurst`) @@ -553,7 +487,7 @@ mining. Journal of Open Source Software, 4(39), p.1504. #### 1. Time series segmentation -Mentors: Tony Bagnall ({user}`TonyBagnall`) and TBC +Mentors: Tony Bagnall ({user}`TonyBagnall`) ##### Description @@ -723,7 +657,7 @@ Series Classification. AALTD, ECML-PKDD, Springer, 2021 ### Visualisation -#### 1. Explainable AI with the shapelet transform (Southampton intern project). +#### 1. Explainable AI with the shapelet transform. Mentors: TonyBagnall ({user}`TonyBagnall`) and David Guijo-Rubio ({user}`dguijo`) @@ -741,13 +675,13 @@ source toolkits, familiarisation with the shapelet code and the development of a visualisation tool to help relate shapelets back to the training data. An outline for the project is -Weeks 1-2: Familiarisation with open source, aeon and the visualisation module. Make +1. Familiarisation with open source, aeon and the visualisation module. Make contribution for a good first issue. -Weeks 3-4: Understand the shapelet transfer algorithm, engage in ongoing discussions +2. Understand the shapelet transfer algorithm, engage in ongoing discussions for possible improvements, run experiments to create predictive models for a test data set -Weeks 5-6: Design and prototype visualisation tools for shapelets, involving a range +3. Design and prototype visualisation tools for shapelets, involving a range of summary measures and visualisation techniques, including plotting shapelets on training data, calculating frequency, measuring similarity between -Weeks 7-8: Debug, document and make PRs to merge contributions into the aeon toolkit. +4. Debug, document and make PRs to merge contributions into the aeon toolkit. [1] Bagnall, A., Lines, J., Bostrom, A., Large, J. and Keogh, E. The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Mining and Knowledge Discovery, Volume 31, pages 606–660, (2017) [2] Ye, L., Keogh, E. Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. Data Min Knowl Disc 22, 149–182 (2011). https://doi.org/10.1007/s10618-010-0179-5 diff --git a/docs/papers_using_aeon.md b/docs/papers_using_aeon.md index 5764664dbb..0031aadfe7 100644 --- a/docs/papers_using_aeon.md +++ b/docs/papers_using_aeon.md @@ -14,9 +14,17 @@ the paper and a link to the code in your personal GitHub or other repository. ## Classification -- Middlehurst, M. and Schäfer, P. and Bagnall, A. (2024). Bake off redux: a review +- Dempster, A., Tan, W. T., Miller, L., Foumani, N., Schmidt, D. and Webb, G (2024). + Highly Scalable Time Series Classification for Very Large Datasets, ECML/PKDD Workshop on Advanced Analytics and Learning on Temporal Data. [Paper](https://ecml-aaltd.github.io/aaltd2024/articles/Dempster_AALTD24.pdf) +- Dempster, A., Schmidt, D. and Webb, G. (2024). QUANT: a minimalist interval method + for time series classification, Data Mining and Knowledge Discovery, Volume 38, + pages 2377–2402. [Paper](https://link.springer.com/article/10.1007/s10618-024-01036-9) +- Serramazza, D., Nguyen, T. and Ifrim, G. (2024) Improving the Evaluation and + Actionability of Explanation Methods for Multivariate Time Series Classification. + Proc. ECML/PKDD [ArXiV](https://arxiv.org/abs/2406.12507) +- Middlehurst, M., Schäfer, P. and Bagnall, A. (2024). Bake off redux: a review and experimental evaluation of recent time series classification algorithms. - Data Mining and Knowledge Discovery, online first, open access. + Data Mining and Knowledge Discovery, Volume 38, pages 1958–2031. [Paper](https://link.springer.com/article/10.1007/s10618-024-01022-1) [Webpage/Code](https://tsml-eval.readthedocs.io/en/stable/publications/2023/tsc_bakeoff/tsc_bakeoff_2023.html) - Spinnato, F. and Guidotti, R. and Monreale, A. and Nanni, M. (2024). Fast, Interpretable, and Deterministic Time Series Classification With a Bag-of-Receptive-Fields. @@ -39,6 +47,12 @@ the paper and a link to the code in your personal GitHub or other repository. Learning on Temporal Data (pp. 39-55). [Paper](https://link.springer.com/chapter/10.1007/978-3-031-49896-1_4) +## Ordinal classification + +- Ayllón-Gavilán, R., Guijo-Rubio, D., Gutiérrez, P.A., Bagnall, A., and Hervás-Martínez, C. Convolutional and Deep Learning based techniques for Time Series Ordinal Classification.[ArXiV](https://arxiv.org/abs/2306.10084). +- Ayllón-Gavilán, R., Guijo-Rubio, D., Gutiérrez, P. A., and Hervás-Martínez, C. (2024). O-Hydra: A Hybrid Convolutional and Dictionary-Based Approach to Time Series Ordinal Classification. In Conference of the Spanish Association for Artificial Intelligence (pp. 50-60). [Paper](https://link.springer.com/chapter/10.1007/978-3-031-62799-6_6). +- Ayllón-Gavilán, R., Guijo-Rubio, D., Gutiérrez, P.A., and Hervás-Martínez, C. (2023). A Dictionary-Based Approach to Time Series Ordinal Classification. In: Rojas, I., Joya, G., Catala, A. (eds) Advances in Computational Intelligence. IWANN 2023. Lecture Notes in Computer Science, vol 14135. [Paper](https://link.springer.com/chapter/10.1007/978-3-031-43078-7_44). + ## Regression - Guijo-Rubio, D., Middlehurst, M., Arcencio, G., Silva, D. and Bagnall, A. (2024). @@ -51,17 +65,14 @@ the paper and a link to the code in your personal GitHub or other repository. (pp. 113-126). [Paper](https://link.springer.com/chapter/10.1007/978-3-031-49896-1_8) [Webpage/Code](https://tsml-eval.readthedocs.io/en/stable/publications/2023/rist_pipeline/rist_pipeline.html) -## Ordinal classification - -- Ayllón-Gavilán, R., Guijo-Rubio, D., Gutiérrez, P.A., Bagnall, A., and Hervás-Martínez, C. Convolutional and Deep Learning based techniques for Time Series Ordinal Classification. [ArXiV](https://arxiv.org/abs/2306.10084). -- Ayllón-Gavilán, R., Guijo-Rubio, D., Gutiérrez, P. A., and Hervás-Martínez, C. (2024). O-Hydra: A Hybrid Convolutional and Dictionary-Based Approach to Time Series Ordinal Classification. In Conference of the Spanish Association for Artificial Intelligence (pp. 50-60). [Paper](https://link.springer.com/chapter/10.1007/978-3-031-62799-6_6). -- Ayllón-Gavilán, R., Guijo-Rubio, D., Gutiérrez, P.A., and Hervás-Martínez, C. (2023). A Dictionary-Based Approach to Time Series Ordinal Classification. In: Rojas, I., Joya, G., Catala, A. (eds) Advances in Computational Intelligence. IWANN 2023. Lecture Notes in Computer Science, vol 14135. [Paper](https://link.springer.com/chapter/10.1007/978-3-031-43078-7_44). - ## Prototyping - Ismail-Fawaz, A. and Ismail Fawaz, H. and Petitjean, F. and Devanne, M. and Weber, - J. and Berretti, S. and Webb, GI. and Forestier, G. (2023 December "ShapeDBA: Generating Effective Time Series Prototypes Using ShapeDTW Barycenter Averaging." ECML/PKDD Workshop on Advanced Analytics and Learning on Temporal Data. [Paper](https://doi.org/10.1007/978-3-031-49896-1_9) [code](https://github.com/MSD-IRIMAS/ShapeDBA) + J. and Berretti, S. and Webb, GI. and Forestier, G. (2023) ShapeDBA: Generating + Effective Time Series Prototypes Using ShapeDTW Barycenter Averaging. ECML/PKDD + Workshop on Advanced Analytics and Learning on Temporal Data. [Paper](https://doi.org/10.1007/978-3-031-49896-1_9) [code](https://github.com/MSD-IRIMAS/ShapeDBA) - Holder, C., Guijo-Rubio, D., & Bagnall, A. J. (2023). Barycentre Averaging for the Move-Split-Merge Time Series Distance Measure. In Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management-Volume 1:, 51-62, pp. 51-62. [Paper](https://www.scitepress.org/Link.aspx?doi=10.5220/0012164900003598) +[Paper](https://www.scitepress.org/Link.aspx?doi=10.5220/0012164900003598) ## Generation Evaluation