Skip to content

Latest commit

 

History

History
40 lines (30 loc) · 1.63 KB

README.md

File metadata and controls

40 lines (30 loc) · 1.63 KB

Statlog Dataset

Perform exploratory data analysis and provide key insights derived from the same backed with suitable graphs and plots.

Dataset Description: The dataset is based on the “Statlog Dataset” from the UCI Machine Learning Repository. Columns of the dataset and their meaning are as follows;

Age (numeric)
Sex (text: male, female)
Job (numeric: 0 - unskilled and non-resident, 1 - unskilled and resident, 2 - skilled, 3 - highly skilled)
Housing (text: own, rent, or free)
Saving accounts (text - little, moderate, quite rich, rich)
Checking account (text - little, moderate, rich)
Credit amount (numeric, in Deutsche Mark)
Duration (numeric, in month)
Purpose (text: car, furniture/equipment, radio/TV, domestic appliances, repairs, education, business, vacation/others

Assignment questions:

  1. Load the dataset into pandas and get a peek at the underlying data in the dataframe.

  2. Provide the following information about the dataframe;

    Dimensions of the dataframe Information about the schema Statistical metrics of each column

  3. Conduct the following data pre-processing steps only as necessary along with the reason behind doing it with suitable steps; Missing values Erroneous/wrong values Skewed data Outliers

  4. Perform exploratory data analysis and provide key insights derived from the same backed with suitable graphs and plots.

Few hints to get you started; Distribution of numerical variables Distribution of categorical variables Numerical vs Categorical plots Numerical vs Numerical plots