Final project for the course "Effective Programming Practices for Economists"
-
Visualize feature importance with SHAP (see the SHAP NIPS paper for details) based on LightGBM framework's learning algorithm.
-
Conduct feature selection and compare its performance
Disclaimer: I already provided the parameters for LGBM in this repo, since it takes a lot of time to optimize them. If you want to optimize them yourself, you can run script parameter_visualization.py in folder code.
Since the data is too big to be stored in github, you can download it from my OneDrive.
Just run main.py
However, since the process takes very long with the whole data, one can decrease the number of rows by giving the variable "num_rows" the total number of rows. Currently, its value is set to 20000: