Skip to content

Latest commit

 

History

History
53 lines (38 loc) · 1.38 KB

README.md

File metadata and controls

53 lines (38 loc) · 1.38 KB

Data Analysis Task

Data analysis task with fully data science lifecyle

Author: Davain Edwards

data_science_cycle

Please do all the notebooks in order. The only exception is notebook 2, which can be skipped.

Using datasets known.csv and unknown.csv, [notebooks](

1_business_understanding.ipynb,
2_data_mining.ipynb, <-- Can be skipped!
3_data_cleaning.ipynb,
4_data_exploration.ipynb,
5_feature_engineering.ipynb,
6_predictive_modelling_with_pycaret.ipynb, <-- Final model training, test evaluation and model saving!
6_predictive_modelling_with_sklearn.ipynb, <!-- Testing and error analysis
7_data_visualization.ipynb) 

Shows a walk through all the steps of the Data Science Life Cycle. It thus contains:

    1. Business Understanding
    1. Data Mining
    1. Data Cleaning
    1. Exploratory Data Analysis
    1. Feature Engineering
    1. Predictive Modeling with Hyperparameter Tuning (Small Error Analysis)
    1. Data Visualization

Requirements

  • pyenv
  • python==3.8.5

Setup

For this purpose you use following commands:

python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Clean Notebooks

jupyter nbconvert --clear-output --inplace [NOTEBOOK.ipynb]