Ubiquant market predictions Time-Series Kaggle Competition

1. Introduction

This repository details the work done by Team C for the Africa Data Science Intensive (DSI) program Module 2 task. The goal of the task was to compete in the time-series prediction competition by Ubiquant on Kaggle. Time-series forecasting is a critical part of data science with many use cases such as epidimeology, inventory planning for businesses, stock trading etc. The different models and approaches used for the competition are detailed here and in our main notebook.

Code Files

File	Description
Main Notebook	Main notebook with EDA and Discussions on Models
Ensemble Inference	Notebook that loads weights and makes ensemble prediction
Model 1 Training	Model 1 DNN Model used in Ensemble Model
Model 2 Training	Model 2 DNN Model used in Ensemble Model
Model 3 Training	Model 3 DNN Model used in Ensemble Model
Investment_ID Clustering	Kmeans Clustering of Investment IDs
Model 1 DNN Optimization	Study of relu activation Vs swish activation in model 1
Model 2 DNN Optimization	Study of dropout layers in model 2
Light GBM	Light GBM notebook for feature importance and model training
EDA and Clustering	EDA and hierarchical clustering of investment IDs using Pearson correlations
Data Preparation (LSTM)	Data pre-proccessing for multi-variate time series model with LSTM
Training (LSTM)	Training of multi-variate time series model with LSTM
Time Series Forcasting	Preditict target using multi-variate time series model with LSTM

2. Competition Description

Competion Description taken from Kaggle:

"Regardless of your investment strategy, fluctuations are expected in the financial market. Despite this variance, professional investors try to estimate their overall returns. Risks and returns differ based on investment types and other factors, which impact stability and volatility. To attempt to predict returns, there are many computer-based algorithms and models for financial market trading. Yet, with new techniques and approaches, data science could improve quantitative researchers' ability to forecast an investment's return."

"In this competition, you’ll build a model that forecasts an investment's return rate. Train and test your algorithm on historical prices. Top entries will solve this real-world data science problem with as much accuracy as possible."

3. EDA

Dataset Description

row_id - A unique identifier for the row.
time_id - The ID code for the time the data was gathered. The time IDs are in order, but the real time between the time IDs is not constant and will likely be shorter for the final private test set than in the training set.
investment_id - The ID code for an investment. Not all investment have data in all time IDs.
target - The target.
[f_0:f_299] - Anonymized features generated from market data.

4. Approaches

With all the models we used for this project, we tried different fine-tuning approaches hoping for better model performance. In this section, we will discuss and show all the fine tunings and different activation functions we tried and show their performance and final scores.

DNN with the Leaky Relu Activation Function

Leaky ReLU function is an improved version of the ReLU activation function. As for the ReLU activation function, the gradient is 0 for all the values of inputs that are less than zero, which would deactivate the neurons in that region and may cause a dying ReLU problem. Leaky ReLU addresses this problem. Instead of defining the ReLU activation function as 0 for negative values of inputs(x), we define it as an extremely small linear component of x.

Relu Function:

f(x)=max(0,x)

Leaky Relu Function:

f(x)=max(0.01*x , x)

source: https://www.mygreatlearning.com/blog/relu-activation-function/

For the hyperparameter tuning, we modified the Dropout Rate, Learning Rate, and Decay Steps as shown in the table below to compare performance;


Dropout Rate	0.4	0.5	0.8
Learning Rate	0.003	0.1	0.001
Decay Steps	9700	10000	10000
RMSE	0.9104	0.9123	0.9128
MSE	0.9151	0.9152	0.9143
Score	0.144	0.143	0.143

LightGBM

source: https://neptune.ai/blog/lightgbm-parameters-guide

In the first notebook, we ran the model wit a fixed learning rate and max_depth but adjusted these to the last two runs to compare results. It had no effect on the performance of the model Also, all 300 features were used initially before using the built-in function for plotting the feature importance which reduced the features to only 188. The metric for the important features is labeled 'new features' in the table above. The only difference the new features added was a drastic reduction in training time else, all other metrics remained the same. Fine-tuning the parameters of the LightGBM did not improve the model in any way. As seen from the table below, all the metrics and scores remained the same.


Objective	Regression	Regression	Regression
Metric	MSE	MSE	MSE
Boosting_type	gbdt	gbdt	gbdt
lambda_l1	2.3e-05	2.3e-05	2.3e-05
lambda_l2	0.1	0.1	0.1
num_leaves	4	10	4
Feature_fraction	0.5	0.6	0.5
Bagging_fraction	0.9	0.8	0.9
Bagging_freq	7	6	7
min_child_samples	20	20	20
num_iterations	1000	1000	1000
learning_rate		0.1	0.1
max_depth		10	10
MSE	0.8055	0.8055	0.8055
MSE(new features)	0.8052	0.8052	0.8975
RMSE	0.8975	0.8975	0.8974
RMSE(new features)	0.8974	0.8974	0.8974
Pearson Corr.	0.1260	0.1260	0.1260
Pearson Corr.(new features)	0.1272	0.1272	0.1272
Score	0.108	0.108	0.108

DNN with Swish Activation Function

*source: medium

Swish is a smooth, non-monotonic function that consistently matches or outperforms ReLU on deep networks.


Learning Rate	0.001	0.001	0.0005	0.0025	0.0005	0.0005
Epochs	30	50	50	50	30	20
Pearson Corr.	0.1220	0.1164	0.1140	0.1193	0.1100	0.1194
Score	0.15	0.149	0.146	0.142	0.147	0.144

DNN with Mish Activation

source: https://krutikabapat.github.io/Swish-Vs-Mish-Latest-Activation-Functions/

The research we did showed Mish worked better than the Swish activation function when dropout rates between 0.2 to 0.75 is used but that was not the case when we applied it to our DNN model. Swish averagely produced better scores than Mish.


Dropout Rate	0.2	0.2	0.5	0.5	0.1	0.1
Epochs	30	50	50	100	50	30
Pearson Corr.	0.1380	0.1280	0.1314	0.1338	0.1434	0.1319
Score	0.143	0.146	0.143	0.143	0.146	0.143

Optuna Study for Swish and Relu Activation on Model 1 Base DNN

An optuna optimization study was carried out to evaluate the performance of the swish and activation functions using the script here. The DNN was set to run 4 epochs for each trial run and 30 trials were carried out. The results for the MSE score obtained from the study are shown below,

The swish activation function performs slightly better as earlier investigated.

Optuna Study for Dropout layers in Model 2 DNN

A study was also carried out to investigate the effect of the two dropout layers in Model 2. The DNN was set to run 4 epochs for each trial run and 100 trials were carried out. The dropout variables "dropout_1" and "dropout_2" were optimized for a range of 0.1 to 0.9. The best value was obtained at {'dropout_1': 0.13889522793629328, 'dropout_2': 0.694167488259274}. The results for the MSE score obtained from the study are shown below,

5. Conclusions

In conclusion we would like to mention a few ideas that due to time constraints we were unable to investigate further but that we believe could have the potential to improve our DNN ensemble model score. The first would be to find the optimal weighted average of the model predations. The second would be to perform more in-depth parameter tuning. For example, the optuna library could be used to optimize the number of layers. Finally, we would have liked to test the effectiveness of combining the results of the LGBM to reduce the number of features input to the DNN.

6. References

Keras Documentation
Optuna
Fast Data Loading and Low Mem with Parquet Files by Rob Mulla
End to end simple and powerful DNN with LeakyReLU by pythonash
Using LightGBM for feature selection by Melanie774
Ubiquant Market Prediction [ DNN ] by Shamia Aftab
【Infer】DNN model ensemble by 老肥
NVIDIA course: Modeling Time Series Data with Recurrent Neural Networks in Keras
Keras: Multiple Inputs and Mixed Data

7. Authors

Amy
Nancy
Chris
Sitwala

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Ubiquant market predictions Time-Series Kaggle Competition

1. Introduction

Code Files

2. Competition Description

3. EDA

Dataset Description

4. Approaches

DNN with the Leaky Relu Activation Function

LightGBM

DNN with Swish Activation Function

DNN with Mish Activation

Optuna Study for Swish and Relu Activation on Model 1 Base DNN

Optuna Study for Dropout layers in Model 2 DNN

5. Conclusions

6. References

7. Authors

Files

README.md

Latest commit

History

README.md

File metadata and controls

Ubiquant market predictions Time-Series Kaggle Competition

1. Introduction

Code Files

2. Competition Description

3. EDA

Dataset Description

4. Approaches

DNN with the Leaky Relu Activation Function

LightGBM

DNN with Swish Activation Function

DNN with Mish Activation

Optuna Study for Swish and Relu Activation on Model 1 Base DNN

Optuna Study for Dropout layers in Model 2 DNN

5. Conclusions

6. References

7. Authors