Data_science_Machine_learning_projects

This repository contain different Exploratory Data Analysis , Data Visualization and prediction , classification and clustering of Data using python Libraries and Machine Learning algorithms

1. Flight Fare Prediction

dataset link : https://www.kaggle.com/nikhilmittal/flight-fare-prediction-mh?select=Data_Train.xlsx
Notebook contains in-depth insight into Dataset - Exploratory Data Analysis, Visualization and Data Preparation.
This Notebook contains method for feature selection based on correlation with target attribute.
Tried different algorithms for regression and got ~80% accuracy using RandomForestRegressor.

2. Boston House Prediction

dataset link : https://www.kaggle.com/vikrishnan/boston-house-prices?select=housing.csv
In this Notbook i perform Exploratory Data Analysis and Visualization and at the end i applied many machine learning algorithm to predict house price
And i got highest accuracy around 73% and it's good for regression and i got that accuracy using GradientBoostingRegressor

3. Credit Card Fraud Classification

dataset link : https://www.kaggle.com/mlg-ulb/creditcardfraud?select=creditcard.csv
In this Notbook i perform Exploratory Data Analysis and Visualization, Apply Under Sampling, Over Sampling, HyperParameter tuning, Outlier Handling and at the end i applied many machine learning algorithm to classify Credit card transaction is Fraud or Non-Fraud
I have uploaded two version of this project
Version 1: In this Version I apply Under Sampling to balance data and I got highest Recall Score: 0.92, Precision Score: 0.98, F1 Score: 0.95, Accuracy Score: 0.95 and i got that accuracy using two algorithm SVC and Logistic Regression
Version 2: In this Version I apply same feature engineering and Exploratory data analysis as Version 1 in this version i only changed the method that convert imblance data into balance data and this method is Synthetic Minority Over-Sampling Technique (SMOTE) and i got ~99.94 % accuracy using Random Forest Classifier

4. Pima Indians Diabetes

dataset link : https://www.kaggle.com/uciml/pima-indians-diabetes-database
The Pima Indians Diabetes Dataset involves predicting the onset of diabetes within 5 years in Pima Indians given medical details.It is a binary (2-class) classification problem
This Notebook contain Basic Exploratory Data Analysis And Visualization and at the end i apply many algorithm to get good accuracy and i get ~76% accuracy on test data and ~92% on whole dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Boston house price prediction.ipynb		Boston house price prediction.ipynb
Credit Card Fraud Detection-2.ipynb		Credit Card Fraud Detection-2.ipynb
Credit card fraud detection-1.ipynb		Credit card fraud detection-1.ipynb
Flight Fare Prediction.ipynb		Flight Fare Prediction.ipynb
Pima Indians Diabetes .ipynb		Pima Indians Diabetes .ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data_science_Machine_learning_projects

1. Flight Fare Prediction

2. Boston House Prediction

3. Credit Card Fraud Classification

4. Pima Indians Diabetes

About

Releases

Packages

Languages

patelom5917/Data_science_Machine_learning_projects

Folders and files

Latest commit

History

Repository files navigation

Data_science_Machine_learning_projects

1. Flight Fare Prediction

2. Boston House Prediction

3. Credit Card Fraud Classification

4. Pima Indians Diabetes

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages