This project aims to predict wine quality based on various chemical attributes. The dataset contains 1599 samples with 12 columns, including 11 chemical properties and one quality rating. The primary objectives include data preprocessing, statistical analysis, visualization, and predictive modeling.
- Project Description
- Data
- Preprocessing
- Statistical Analysis
- Visualization
- Modeling
- Requirements
- Usage
- Contributors
- License
The dataset consists of 11 chemical attributes and a quality score ranging from 3 to 8. The project involves:
- Handling missing values.
- Performing statistical analysis.
- Visualizing data relationships.
- Training and evaluating machine learning models.
- File Name:
winequality-red.csv - Number of Samples: 1599
- Features:
- Fixed acidity
- Volatile acidity
- Citric acid
- Residual sugar
- Chlorides
- Free sulfur dioxide
- Total sulfur dioxide
- Density
- pH
- Sulphates
- Alcohol
- Quality (target variable)
- Null Value Handling: Replaced missing values in numerical columns with the mean and in categorical columns with the mode.
- Data Cleaning: Removed records with null values and imputed missing data as necessary.
Performed statistical operations on the dataset, including:
- Count
- Sum
- Range
- Minimum
- Maximum
- Mean
- Median
- Mode
- Variance
- Standard Deviation
Utilized various plots to visualize the data:
- Scatter Plots
- Line Graphs
- Histograms
These visualizations help in understanding the relationships between different chemical attributes and wine quality.
- Algorithms Used: K-Nearest Neighbors (KNN) Classifier and Regressor
- Train-Test Split: 80% training data and 20% testing data
- Evaluation Metrics: Accuracy, Mean Squared Error (MSE)
- Python 3.x
- Pandas
- NumPy
- Matplotlib
- Scikit-learn
-
Clone the repository:
git clone https://github.com/devanmodhavadiya189/Data-Cleaning-and-Predictive-Modeling-Project
-
Navigate to the project directory:
cd Data-Cleaning-and-Predictive-Modeling-Project -
Install the required packages:
pip install -r requirements.txt
-
Run the analysis and modeling:
python main.ipnyb
- Anas Multani
- Devan Modhavadiya
This project is licensed under the MIT License - see the LICENSE file for details.