A demonstration of the power of the tsfresh feature extraction library by trying to predict the price of an asset.
With this project, I show how the tsfresh library can be applied for building a regression model on market data. The aim is to predict the value of the next data point in a given timeseries.
The model will predict the price from the beginning of the current month until now. The model is trained on a user selected number of days, prior to the beginning of the current month.
It can be used to predict the price with a regression model for a point in time, based on the lookahead value. This will predict the price for this value's next datapoint.
This script was tested on Python 3.7.9
Install the necessary libraries from requirements.txt
Run the file predict_price.ipynb
from a Jupyter notebook IDE
All necessary functions are stored in functions.py
The following variables can be chosen freely:
- ticker : crypto asset ticker that is available on Binance
- freq : interval ('1D','4H', ...)
- train_days: number of days of training data
- init : do not (0) or (1) calculate the best regressor
- lookahead : predict price for this number of datapoints in the future
- verbose : whether to output the regression metrics during calculation
TSFresh rolling window size
- max_window_size : max length of the rolled window for feature extraction
- min_window_size : minimum number of days for a rolling window
- Setting variables
- Import libraries
- Loading data
- Feature extraction
- Select the best sklearn regression model
- Train the model with the best regression method
- Plot the graph
- View the performance
- Get the prediction
Selecting the best_regressor takes a little bit of time on first run, but this time is reduced greatly on second runs through the removal of the worst performing regressors.
The higher the number of days in the training data, the better our model scores. Here we see the predicted values versus the actual values from a regression model that has been trained on 720 days of training data with 4 hour intervals.
We can also plot the predicted values for this month versus the actual values to see how well our regression model performs. This is particularly useful to see the effect of the number of days in the training data.
Please see my medium blogpost for more examples and information about the training and performance of the model.
We could improve this code to
- See how well it performs for selling / buying, ie. just to use it as a sell or buy indicator
- We could extend the range for predictions by making it predict on a rolling window of predictions
- We could extract only the relevant features to reduce the size of the dataframes
While this script is by all means not intended to predict the price of a crypto asset, it clearly demonstrates the power of the tsfresh automated feature extraction library.
Written by Frank Trioen , February 2023