This repository details the work done by Team C for the Africa Data Science Intensive (DSI) program Module 2 task. The goal of the task was to compete in the time-series prediction competition by Ubiquant on Kaggle. Time-series forecasting is a critical part of data science with many use cases such as epidimeology, inventory planning for businesses, stock trading etc. The different models and approaches used for the competition are detailed here and in our main notebook.
File | Description |
---|---|
Main Notebook | Main notebook with EDA and Discussions on Models |
Ensemble Inference | Notebook that loads weights and makes ensemble prediction |
Model 1 Training | Model 1 DNN Model used in Ensemble Model |
Model 2 Training | Model 2 DNN Model used in Ensemble Model |
Model 3 Training | Model 3 DNN Model used in Ensemble Model |
Investment_ID Clustering | Kmeans Clustering of Investment IDs |
Model 1 DNN Optimization | Study of relu activation Vs swish activation in model 1 |
Model 2 DNN Optimization | Study of dropout layers in model 2 |
Light GBM | Light GBM notebook for feature importance and model training |
EDA and Clustering | EDA and hierarchical clustering of investment IDs using Pearson correlations |
Data Preparation (LSTM) | Data pre-proccessing for multi-variate time series model with LSTM |
Training (LSTM) | Training of multi-variate time series model with LSTM |
Time Series Forcasting | Preditict target using multi-variate time series model with LSTM |
Competion Description taken from Kaggle:
"Regardless of your investment strategy, fluctuations are expected in the financial market. Despite this variance, professional investors try to estimate their overall returns. Risks and returns differ based on investment types and other factors, which impact stability and volatility. To attempt to predict returns, there are many computer-based algorithms and models for financial market trading. Yet, with new techniques and approaches, data science could improve quantitative researchers' ability to forecast an investment's return."
"In this competition, you’ll build a model that forecasts an investment's return rate. Train and test your algorithm on historical prices. Top entries will solve this real-world data science problem with as much accuracy as possible."
- row_id - A unique identifier for the row.
- time_id - The ID code for the time the data was gathered. The time IDs are in order, but the real time between the time IDs is not constant and will likely be shorter for the final private test set than in the training set.
- investment_id - The ID code for an investment. Not all investment have data in all time IDs.
- target - The target.
- [f_0:f_299] - Anonymized features generated from market data.
With all the models we used for this project, we tried different fine-tuning approaches hoping for better model performance. In this section, we will discuss and show all the fine tunings and different activation functions we tried and show their performance and final scores.
Leaky ReLU function is an improved version of the ReLU activation function. As for the ReLU activation function, the gradient is 0 for all the values of inputs that are less than zero, which would deactivate the neurons in that region and may cause a dying ReLU problem. Leaky ReLU addresses this problem. Instead of defining the ReLU activation function as 0 for negative values of inputs(x), we define it as an extremely small linear component of x.
Relu Function:
f(x)=max(0,x)
Leaky Relu Function:
f(x)=max(0.01*x , x)
source: https://www.mygreatlearning.com/blog/relu-activation-function/
For the hyperparameter tuning, we modified the Dropout Rate, Learning Rate, and Decay Steps as shown in the table below to compare performance;
Dropout Rate | 0.4 | 0.5 | 0.8 |
Learning Rate | 0.003 | 0.1 | 0.001 |
Decay Steps | 9700 | 10000 | 10000 |
RMSE | 0.9104 | 0.9123 | 0.9128 |
MSE | 0.9151 | 0.9152 | 0.9143 |
Score | 0.144 | 0.143 | 0.143 |
source: https://neptune.ai/blog/lightgbm-parameters-guide
In the first notebook, we ran the model wit a fixed learning rate and max_depth but adjusted these to the last two runs to compare results. It had no effect on the performance of the model Also, all 300 features were used initially before using the built-in function for plotting the feature importance which reduced the features to only 188. The metric for the important features is labeled 'new features' in the table above. The only difference the new features added was a drastic reduction in training time else, all other metrics remained the same. Fine-tuning the parameters of the LightGBM did not improve the model in any way. As seen from the table below, all the metrics and scores remained the same.
Objective | Regression | Regression | Regression |
Metric | MSE | MSE | MSE |
Boosting_type | gbdt | gbdt | gbdt |
lambda_l1 | 2.3e-05 | 2.3e-05 | 2.3e-05 |
lambda_l2 | 0.1 | 0.1 | 0.1 |
num_leaves | 4 | 10 | 4 |
Feature_fraction | 0.5 | 0.6 | 0.5 |
Bagging_fraction | 0.9 | 0.8 | 0.9 |
Bagging_freq | 7 | 6 | 7 |
min_child_samples | 20 | 20 | 20 |
num_iterations | 1000 | 1000 | 1000 |
learning_rate | 0.1 | 0.1 | |
max_depth | 10 | 10 | |
MSE | 0.8055 | 0.8055 | 0.8055 |
MSE(new features) | 0.8052 | 0.8052 | 0.8975 |
RMSE | 0.8975 | 0.8975 | 0.8974 |
RMSE(new features) | 0.8974 | 0.8974 | 0.8974 |
Pearson Corr. | 0.1260 | 0.1260 | 0.1260 |
Pearson Corr.(new features) | 0.1272 | 0.1272 | 0.1272 |
Score | 0.108 | 0.108 | 0.108 |
*source: medium
Swish is a smooth, non-monotonic function that consistently matches or outperforms ReLU on deep networks.
Learning Rate | 0.001 | 0.001 | 0.0005 | 0.0025 | 0.0005 | 0.0005 |
Epochs | 30 | 50 | 50 | 50 | 30 | 20 |
Pearson Corr. | 0.1220 | 0.1164 | 0.1140 | 0.1193 | 0.1100 | 0.1194 |
Score | 0.15 | 0.149 | 0.146 | 0.142 | 0.147 | 0.144 |
source: https://krutikabapat.github.io/Swish-Vs-Mish-Latest-Activation-Functions/
The research we did showed Mish worked better than the Swish activation function when dropout rates between 0.2 to 0.75 is used but that was not the case when we applied it to our DNN model. Swish averagely produced better scores than Mish.
Dropout Rate | 0.2 | 0.2 | 0.5 | 0.5 | 0.1 | 0.1 |
Epochs | 30 | 50 | 50 | 100 | 50 | 30 |
Pearson Corr. | 0.1380 | 0.1280 | 0.1314 | 0.1338 | 0.1434 | 0.1319 |
Score | 0.143 | 0.146 | 0.143 | 0.143 | 0.146 | 0.143 |
Optuna Study for Swish and Relu Activation on Model 1 Base DNN
An optuna optimization study was carried out to evaluate the performance of the swish and activation functions using the script here. The DNN was set to run 4 epochs for each trial run and 30 trials were carried out. The results for the MSE score obtained from the study are shown below,
The swish activation function performs slightly better as earlier investigated.
Optuna Study for Dropout layers in Model 2 DNN
A study was also carried out to investigate the effect of the two dropout layers in Model 2. The DNN was set to run 4 epochs for each trial run and 100 trials were carried out. The dropout variables "dropout_1" and "dropout_2" were optimized for a range of 0.1 to 0.9. The best value was obtained at {'dropout_1': 0.13889522793629328, 'dropout_2': 0.694167488259274}. The results for the MSE score obtained from the study are shown below,
In conclusion we would like to mention a few ideas that due to time constraints we were unable to investigate further but that we believe could have the potential to improve our DNN ensemble model score. The first would be to find the optimal weighted average of the model predations. The second would be to perform more in-depth parameter tuning. For example, the optuna library could be used to optimize the number of layers. Finally, we would have liked to test the effectiveness of combining the results of the LGBM to reduce the number of features input to the DNN.
- Keras Documentation
- Optuna
- Fast Data Loading and Low Mem with Parquet Files by Rob Mulla
- End to end simple and powerful DNN with LeakyReLU by pythonash
- Using LightGBM for feature selection by Melanie774
- Ubiquant Market Prediction [ DNN ] by Shamia Aftab
- 【Infer】DNN model ensemble by 老肥
- NVIDIA course: Modeling Time Series Data with Recurrent Neural Networks in Keras
- Keras: Multiple Inputs and Mixed Data