Skip to content

devanmodhavadiya189/Data-Cleaning-and-Predictive-Modeling-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Wine Quality Prediction

Overview

This project aims to predict wine quality based on various chemical attributes. The dataset contains 1599 samples with 12 columns, including 11 chemical properties and one quality rating. The primary objectives include data preprocessing, statistical analysis, visualization, and predictive modeling.

Table of Contents

Project Description

The dataset consists of 11 chemical attributes and a quality score ranging from 3 to 8. The project involves:

  1. Handling missing values.
  2. Performing statistical analysis.
  3. Visualizing data relationships.
  4. Training and evaluating machine learning models.

Data

  • File Name: winequality-red.csv
  • Number of Samples: 1599
  • Features:
    • Fixed acidity
    • Volatile acidity
    • Citric acid
    • Residual sugar
    • Chlorides
    • Free sulfur dioxide
    • Total sulfur dioxide
    • Density
    • pH
    • Sulphates
    • Alcohol
    • Quality (target variable)

Preprocessing

  1. Null Value Handling: Replaced missing values in numerical columns with the mean and in categorical columns with the mode.
  2. Data Cleaning: Removed records with null values and imputed missing data as necessary.

Statistical Analysis

Performed statistical operations on the dataset, including:

  • Count
  • Sum
  • Range
  • Minimum
  • Maximum
  • Mean
  • Median
  • Mode
  • Variance
  • Standard Deviation

Visualization

Utilized various plots to visualize the data:

  • Scatter Plots
  • Line Graphs
  • Histograms

These visualizations help in understanding the relationships between different chemical attributes and wine quality.

Modeling

  • Algorithms Used: K-Nearest Neighbors (KNN) Classifier and Regressor
  • Train-Test Split: 80% training data and 20% testing data
  • Evaluation Metrics: Accuracy, Mean Squared Error (MSE)

Requirements

  • Python 3.x
  • Pandas
  • NumPy
  • Matplotlib
  • Scikit-learn

Usage

  1. Clone the repository:

    git clone https://github.com/devanmodhavadiya189/Data-Cleaning-and-Predictive-Modeling-Project
  2. Navigate to the project directory:

    cd Data-Cleaning-and-Predictive-Modeling-Project
  3. Install the required packages:

    pip install -r requirements.txt
  4. Run the analysis and modeling:

    python main.ipnyb

Contributors

  • Anas Multani
  • Devan Modhavadiya

License

This project is licensed under the MIT License - see the LICENSE file for details.


About

Data-Cleaning-and-Predictive-Modeling-Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors