Skip to content

In this assignment, students will showcase their ability to perform EDA and Feature Engineering based on the a data science project.

Notifications You must be signed in to change notification settings

Omdena-NIC-Nepal/EDA-Feature-Engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Assignment Project: Exploratory Data Analysis and Feature Engineering on Second-Hand Car Dataset

Objective:

The objective of this assignment is to perform exploratory data analysis (EDA) and feature engineering on a dataset of second-hand cars in India. This will involve understanding the structure of the dataset, handling missing data and outliers, performing summary statistics and data visualization, and creating new features from the existing data.

Dataset Description:

The dataset contains the following variables:

Company Name: The manufacturer of the car (e.g., Maruti, Hyundai). Car Name: The model name of the car (e.g., Swift, i20).

Variant: The specific variant of the car model (e.g., VDI, Sportz).

Fuel Type: Type of fuel used by the car (e.g., Petrol, Diesel).

Tyre Condition: Condition of the tyres (e.g., New, Good, Average, Worn).

Make Year: Year the car was manufactured.

Owner Type: Number of previous owners (e.g., First, Second, Third).

**Transmission Type: **Type of transmission (e.g., Manual, Automatic).

Body Color: Color of the car's body.

Service Record: Whether the car has a service record (Yes/No). Insurance: Whether the car has insurance (Yes/No). Registration Certificate: Whether the car has a registration certificate (Yes/No). Accessory: Additional accessories available with the car. Mileage: The mileage of the car (in kmpl). **Price: **The selling price of the car.

Instructions:

Part 1: Understanding the Structure of the Dataset

  • Load the Dataset: Load the dataset into a pandas DataFrame.
  • Data Types: Identify the data types of each column.
  • Data Shape: Check the dimensions (rows and columns) of the dataset.
  • Feature Descriptions: Understand what each feature represents.

Part 2: Handling Missing Data and Outliers

Missing Data:

  • Identify missing values in the dataset.
  • Decide on strategies to handle missing data (imputation, deletion, or advanced methods). Outliers:
  • Identify outliers using visualization (box plots, scatter plots) and statistical methods (z-score, IQR).
  • Decide whether to remove outliers or transform them.

Part 3: Exploratory Data Analysis Techniques

Summary Statistics:

  • Calculate basic statistics such as mean, median, mode, standard deviation, and range for numerical data.
  • Count frequencies and percentages for categorical data. Data Visualization:
  • Use plots like histograms, box plots, scatter plots, and bar charts to visualize distributions, relationships between variables, and trends.

Part 4: Feature Engineering

Creating New Features:

  • Extract new features from existing ones, such as extracting year from Make Year.
  • Convert categorical variables into dummy variables. Transforming Existing Features:
  • Normalize or standardize numerical features.
  • Bin continuous variables into categorical ones.
  • Encode categorical variables into numerical form using label encoding or one-hot encoding. Feature Selection:
  • Identify highly correlated features to avoid multicollinearity.
  • Use techniques like PCA or feature importance from models to select the most relevant features.

About

In this assignment, students will showcase their ability to perform EDA and Feature Engineering based on the a data science project.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published