Performs scientific literature keyword-keyword co-occurrence prediction based on associated metadata based features.
- NetworkX 2.2
- Keras
- theano
- statsmodel
- scipy
- numpy
- matplotlib
- pandas
- sklearn
- link_prediction.ipynb - End to end link prediction (direct classification using feature - no forecast) experiment
- timeseries_forecast.ipynb - End to end link forecast-> classification experiment (first forecasting, then classification- the one used in this experiment)
- Link_Analysis.ipynb - Data generation regarding keyword network evolution and associated characteristics
- timeseries.ipynb - LSTM based Timeseries analysis of nodal degree
- graphs.py - Contains required functions to build, save and load graphs
- utils.py - Utility functions
- classification.py - Initial training-test set preparation, model training and evaluation
- feature_selection.py - Node level and edge level feature generation
- versions.py - Checks the versions of different packages
- Apnea dataset - All keywords are atleast 3-degree keywords
- Apnea keyword lists - Keyword list with integer id
- Obesity dataset - All keywords are atleast 3-degree keywords
- Obesity keyword lists - Keyword list with integer id
- Open link_prediction notebook. End-to-end link prediction experiment is done here (graph build, save, load -> training data prepare, save, load -> model training, save, evaluate -> result save, load -> figure generate, save)
- Experimental analysis related to keyword network evolution is done in link_analysis notebook.
- LSTM timeseries forecasting of top-3 central keywords nodal degree is done in timeseries notebook. Then ground truth vs predicted value graph is generated.
- Fahim Faisal
- Dr. Nazim Choudhury