This project uses Azure Machine Learning Workbench. Scikit-Learn, and Keras to perform sequence classification on financial transaction data, to predict whether someone is likely to default.
This binary classification task is implemented in multiple ways:
- Support Vector Machine: Traditional ML approach with feature engineering, using SVM to handle high dimensionality and sparse feature vectors.
- Long Short-Term Memory: Deep learning approach using LSTM, a natural choice for sequence labeling with it's ability to learn long-term dependencies.
- Convolutional Neural Network: Extending CNN beyond it's traditional image-based tasks, using Convolution 1D to find spatial patterns in the sequential data. With our dataset, CNN achieved highest accuracy and lower training times than LSTM.
The script preprocess.py takes the input data and builds a 3D matrix to feed into the LSTM network of shape (samples, timesteps, features).