This repo provides the code to run simple benchmarks for signature features on the UCR/UEA Time Series Classification repository. (Work in progress)
Some of our benchmark results are uploaded in this Github
repository under results/.
Currently, there are two Jupyter notebooks that analyze
the results:
-
compare_ensembles.ipynbdraws a comparison between two ensemble classifiers using signature features. -
result_analysis.ipynbdraws a comparison between the XGBoost classifier and the classical logistic regression on signature features.
This project uses the Python packages iisignature, sklearn, numpy, pandas, sktime.
Furthermore, luigi is used for orchestration of different tasks (slight overkill currently).
First clone the repo:
git clone https://github.com/zhy0/sig-tsc.git
Create a Python 3 virtual env and install the dependencies:
cd sig-tsc
python3 -m venv env
source env/bin/activate
# install numpy and cython separately, this is due to an issue with installing sktime
pip install numpy
pip install cython
# install rest of the dependencies
pip install -r requirements.txtThis will create the folder pipeline under the
sig-tsc folder which contains the results of the classification.
Enter the sig-tsc folder and run the following commands:
# UCR datasets
wget http://www.timeseriesclassification.com/Downloads/Archives/Univariate2018_arff.zip
unzip Univariate2018_arff.zip
# UEA datasets (multivariate)
wget http://www.timeseriesclassification.com/Downloads/Archives/Multivariate2018_arff.zip
unzip Multivariate2018_arff.zipUnzipping will create two folders, NewTSCProblems and MultivariateTSCProblems.
In tasks.py, the location of these two folders can be changed. By default
they're assumed to be in the sig-tsc folder.
To run a benchmark of a single dataset, you can use the luigi command line:
python -m luigi --module tasks RunUnivariate --levels '[2,3,4]' --dataset ECG200 --model-type sklearn.svm.LinearSVC --sig-type sig --local-schedulerSee tasks.py for the individual parameters.
(The local-scheduler flag tells Luigi to use the local scheduler.)
Running benchmarks on all datasets can done using
python tasks.pyYou can edit the contents of main() to change the behavior.