Skip to content

Home Loan approval prediction based on 2020 HMDA large scale (10GB) dataset

Notifications You must be signed in to change notification settings

Dhruv-praju/Mortgage-approval-system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mortgage Approval Prediction System

alt text


alt text
It is a home loan approval prediction platform. It predicts the decision of home loan application based on borrower's financial information, property information, geographic context, loan application details, etc.

Below is the diagram of ML pipeline used to train the model for the platform. The pipeline was trained on large-scale (10GB) official 2020 U.S Home Mortgage Disclosure act(HMDA) dataset. And the trained ML model was integrated in web-application to make predictions.

ML pipeline

alt text

Stack

  • Pyspark framework was used to make the entire pipeline
  • AWS EMR / GCP Dataproc cluster of (1 Master machine and 3 Worker machines) was used as compute to train the pipeline
  • AWS S3 was used as datasource and warehouse
  • Streamlit was used to make web-app

Files Overview

2020_lar.txt - dataset file (~10GB and ~25M rows)

Final-GCP.py - contains Pyspark code to build ML pipeline that contains stage like Ingestion, cleaning, Preprocessing, Feature Engineering, Model training and Model evaluation which was run on GCP cluster

models - folder contains trained tranformer & ML models (trained on GCP cluster)

app.py - contains Web UI to make predicts from input fields

Download and run

To run the application run the following commands

streamlit run app.py

About

Home Loan approval prediction based on 2020 HMDA large scale (10GB) dataset

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published