This repository contain different Exploratory Data Analysis , Data Visualization and prediction , classification and clustering of Data using python Libraries and Machine Learning algorithms
- dataset link : https://www.kaggle.com/nikhilmittal/flight-fare-prediction-mh?select=Data_Train.xlsx
- Notebook contains in-depth insight into Dataset - Exploratory Data Analysis, Visualization and Data Preparation.
- This Notebook contains method for feature selection based on correlation with target attribute.
- Tried different algorithms for regression and got ~80% accuracy using RandomForestRegressor.
- dataset link : https://www.kaggle.com/vikrishnan/boston-house-prices?select=housing.csv
- In this Notbook i perform Exploratory Data Analysis and Visualization and at the end i applied many machine learning algorithm to predict house price
- And i got highest accuracy around 73% and it's good for regression and i got that accuracy using GradientBoostingRegressor
- dataset link : https://www.kaggle.com/mlg-ulb/creditcardfraud?select=creditcard.csv
- In this Notbook i perform Exploratory Data Analysis and Visualization, Apply Under Sampling, Over Sampling, HyperParameter tuning, Outlier Handling and at the end i applied many machine learning algorithm to classify Credit card transaction is Fraud or Non-Fraud
- I have uploaded two version of this project
- Version 1: In this Version I apply Under Sampling to balance data and I got highest Recall Score: 0.92, Precision Score: 0.98, F1 Score: 0.95, Accuracy Score: 0.95 and i got that accuracy using two algorithm SVC and Logistic Regression
- Version 2: In this Version I apply same feature engineering and Exploratory data analysis as Version 1 in this version i only changed the method that convert imblance data into balance data and this method is Synthetic Minority Over-Sampling Technique (SMOTE) and i got ~99.94 % accuracy using Random Forest Classifier
- dataset link : https://www.kaggle.com/uciml/pima-indians-diabetes-database
- The Pima Indians Diabetes Dataset involves predicting the onset of diabetes within 5 years in Pima Indians given medical details.It is a binary (2-class) classification problem
- This Notebook contain Basic Exploratory Data Analysis And Visualization and at the end i apply many algorithm to get good accuracy and i get ~76% accuracy on test data and ~92% on whole dataset.