Statlog Dataset

Perform exploratory data analysis and provide key insights derived from the same backed with suitable graphs and plots.

Dataset Description: The dataset is based on the “Statlog Dataset” from the UCI Machine Learning Repository. Columns of the dataset and their meaning are as follows;

Age (numeric)
Sex (text: male, female)
Job (numeric: 0 - unskilled and non-resident, 1 - unskilled and resident, 2 - skilled, 3 - highly skilled)
Housing (text: own, rent, or free)
Saving accounts (text - little, moderate, quite rich, rich)
Checking account (text - little, moderate, rich)
Credit amount (numeric, in Deutsche Mark)
Duration (numeric, in month)
Purpose (text: car, furniture/equipment, radio/TV, domestic appliances, repairs, education, business, vacation/others

Assignment questions:

Load the dataset into pandas and get a peek at the underlying data in the dataframe.
Provide the following information about the dataframe;

Dimensions of the dataframe Information about the schema Statistical metrics of each column
Conduct the following data pre-processing steps only as necessary along with the reason behind doing it with suitable steps; Missing values Erroneous/wrong values Skewed data Outliers
Perform exploratory data analysis and provide key insights derived from the same backed with suitable graphs and plots.

Few hints to get you started; Distribution of numerical variables Distribution of categorical variables Numerical vs Categorical plots Numerical vs Numerical plots

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Statlog Dataset

Files

README.md

Latest commit

History

README.md

File metadata and controls

Statlog Dataset