This repository contains the code and data used in my master thesis Change-point detection in time series: a study of online methods applied to computer network measeurements, completed in the Systems Engineering and Computer Science Program of the COPPE/UFRJ. If you find any bug or have some doubt or observation, please let me know.
ALMEIDA, C. M. 2024. *Change-point Detection in Time Series: A Study of Online Methods Using Computer Network Measurements. M.Sc. thesis. COPPE/UFRJ, Rio de Janeiro, RJ, Brasil.
@mastersthesis{cma2024thesis,
title={Change-point Detection in Time Series: A Study of Online Methods Using Computer Network Measurements},
author={Almeida, Cleiton Moya de},
year={2024},
school={COPPE/UFRJ}
address={Rio de Janeiro, RJ, Brasil}
type={Master's thesis}
}
- The changepoint methods are implemented in the module Experiment/changepoint_module.py.
- There are three experiments:
- NDT Dataset: run the file Experiment/experiment_ndt.py to evaluate the methods using the NDT dataset. The hyperparameters are setted in the body of the script.
- For each method, a pickled Pandas object containing the results is created in the folder [Experiment/results_ndt].
- Shao Dataset: run the file Experiment/experiment_shao.py to evaluate the methods using the Shao dataset.
- For each method, a pickled Pandas object containing the results is created in the folder [Experiment/results_shao].
- Shao dataset grid search: run the file Experiment/experiment_shao.py to reproduce the grid search used for hyperparameters tunning. Two pickle files are created:
df_shaogrid_*.pkl
: dataframe containing the evaluation summary for each hyperparameter set;df_shaobest_*.pkl
: dataframe containing the dataset evaluation with the best hyperparamter set finded;
- Be aware that the VWCD and RRCF methods can take several hours to complete. You can easily modify the script to run only selected methods.
- NDT Dataset: run the file Experiment/experiment_ndt.py to evaluate the methods using the NDT dataset. The hyperparameters are setted in the body of the script.
- The graphics can be reproduced using the scripts in the folder Figures_scripts. I used the Spyder environment to run the scripts and manually save the figures.
The code was developed and tested with Python 3.9 using the following packages:
- Numpy 1.23.5
- Scipy 1.11.4
- Pandas 1.5.2
- Statsmodels 0.14.1
- Matplotlib 3.8.2
- R2Py 3.5.11
- rrcf 0.4.4
- Munkres 1.1.4
- Seaborn 0.13.1
- To run the The Pelt-NP algorithm, we use the R 4.3.2 with the package changepoint.np:
- changepoint.np 1.0.5
- We call R function inside Python code using the
R2Py
package. The environment variablesR_HOME
andR_USER
shall be properly setted. - To plot the Average Run Lenght functions (Figure 3.2), we used the package Statistical Process Control:
- spc 0.6.8
- Raspberry data collection: Code used in the Raspberry PI clients to automate the data collection.
- Shao Dataset: Wenqing Shao provided labeled datasets of round trip time and scripts to detect and evaluate changepoint methods. In this work, we used the real_trace_labelled dataset and also the benchmark funtions. Thanks Wenqin Shao to share your work.
- Mlab NDT: The NDT client reference implementation, in Go. We compiled this client to ARM (Raspberry PI OS 64bits) to execute the NDT tests.
- bocd: Python implementation of Bayesian Online Changepoint Detection for a normal model with unknown mean parameter
- DSM-bocd: Provide python implementations for BOCD using the normal model with unknown mean and variance parameter, multivariate gaussian and the DSM-BOCD.
- rrcf: Python implementation of the Robust Random Cut Forest algorithm.
- Ruptures: Python library for off-line change point detection (unfortunatelly, does no implement the Pelt-NP neither the MBIC loss fuctiom).
- scp: R package Statistical Process Control – Calculation of ARL and Other Control Chart Performance Measures
- changepoint.np: R implementation of the Pelt-NP