Skip to content

Loan Approval Prediction using Machine Learning classifier algorithms

Notifications You must be signed in to change notification settings

itzKshitijaC/Loan-Approval-Prediction-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Loan Approval Prediction 📊

Background 🔍

LOANS are a major requirement of the modern world. By this, banks get a major part of the total profit. It is beneficial for students to manage their education and living expenses, and for people to buy any kind of luxury like houses, cars, etc. But when it comes to deciding whether the applicant’s profile is relevant to be granted a loan or not. Banks have to look after many aspects. We will develop one such model that can predict whether a person will get his/her loan approved or not by using some of the applicant's background information like the applicant’s gender, marital status, income, etc.

Dataset Information 🔍

Download Dataset from here

The Dataset Contains 13 features

1. Loan_ID: A unique identifier for each loan application. It doesn't contribute to the decision-making process but can be useful for record-keeping

2. Gender: Lending institutions might consider gender as a factor in loan approval, depending on historical data or institutional policies. For instance, if there's evidence of gender-based discrimination, it could affect loan approval.

3. Married: Married individuals may be perceived as more financially stable and responsible. Lenders might be more inclined to approve loans for married applicants.

4. Dependents: The number of dependents could influence loan approval, as more dependents might mean higher financial responsibilities. Lenders may assess the applicant's ability to repay the loan considering their family size.

5. Education: The level of education might be a proxy for the applicant's earning potential and financial stability. Graduates may be perceived as having better job prospects and, consequently, higher repayment capabilities.

6. Self-employed: Self-employed individuals may face different income patterns compared to salaried individuals. Lenders might scrutinize the stability of self-employed applicants' income sources.

7. ApplicantIncome: Higher income generally indicates a better ability to repay a loan. However, extremely high or low incomes might be red flags. Lenders may set income thresholds for loan approval.

8. CoapplicantIncome: The income of the coapplicant can supplement the household income, affecting the overall repayment capacity. A higher coapplicant income may positively influence loan approval

9. LoanAmount: The amount of the loan applied for is crucial. Lenders will assess whether the requested loan amount aligns with the applicant's income and financial situation

10. Loan_Amount_Term: The term of the loan affects monthly repayment amounts. Shorter terms might indicate a quicker repayment ability, while longer terms might be associated with higher overall interest payments.

11. Credit_History: This is likely one of the most critical factors. A good credit history (1.0) is generally associated with a higher likelihood of loan approval. Lenders heavily rely on credit history to assess risk.

12. PropertyArea: The location of the property can influence loan approval. Urban areas might have different risk profiles than rural areas, and lenders may have specific criteria for different regions.

Life Cycle of ML Project 🔍

1. Data Collection

Gather relevant data for model training. This may include historical loan data, customer information, credit scores, employment history, and other relevant features. Ensure that the data is representative and diverse

2. Data Cleaning

Clean the data to handle missing values, outliers, and inconsistencies. This step is crucial for the model's accuracy and generalization. We may need to impute missing values, standardize or normalize features, and deal with any data anomalies.

3. Exploratory Data Analysis (EDA)

Conduct exploratory data analysis to understand the relationships between different variables, identify patterns, and gain insights. Visualization tools can be helpful in this phase.

4. Feature Engineering

Create new features or modify existing ones to improve the model's performance. This might involve transforming variables, creating interaction terms, or encoding categorical variables.

5. Data Splitting

Split the dataset into training and testing sets. The training set is used to train the model, and the testing set is used to evaluate its performance

6. Model Selection

Choose an appropriate machine learning algorithm for your problem. Common algorithms for loan approval prediction include logistic regression, decision trees, random forests, and support vector machines.

7. Model Training

Train the selected model using the training dataset. This involves feeding the algorithm the features and corresponding labels and letting it learn the patterns in the data.

8. Model Evaluation

Evaluate the model's performance on the testing dataset using appropriate metrics such as accuracy, precision, recall, and F1 score

Machine Learning Models used in the Project 🔍

  1. Logistic Regression
  2. Decision Tree Classifier
  3. Random Forest Classifier
  4. KNeighbors Classifier

Model Training 🔍

  • Several machine learning models, including Logistic Regression, Decision Trees, Random Forest, and Support Vector Machines (SVM), were likely trained and evaluated.
  • Features such as ApplicantIncome, CoapplicantIncome, LoanAmount, and Credit_History were considered important predictors for loan approval.

Prediction and Evaluation 🔍

  • The models' performance was evaluated using accuracy, precision, recall, and F1-score metrics.
  • Cross-validation was used to ensure the robustness of the models.

Conclusion 🔚😀

Here are the accuracies of the models:

Logistic Regression:

Initial accuracy: 77.27% Cross-validation accuracy: 80.95%

Decision Tree Classifier:

Accuracy: 80.09%

Random Forest Classifier:

Accuracy: 88.15%

K-Nearest Neighbors (KNN):

Accuracy: 72.51%

The Random Forest Classifier achieved the highest accuracy at 88.15%, making it the best-performing model among those evaluated for predicting loan approval in this project. ​