This project aims to analyze trends in the Paycheck Protection Program and to generate predictive models based on demographics to predict the likelihood of receiving a loan.
Overarching Business Question: What factors played the largest roles in PPP loan distribution, and can I accurately predict maximum loan values given those factors?
Data Source: https://www.kaggle.com/susuwatari/ppp-loan-data-paycheck-protection-program
Packages: Tidyverse, ggplot2, tm, car (for VIF), MASS (for stepAIC)
Attempt 2 will make use of Data Cleaning in R and analysis in Sklearn