Skip to content

Regression data analysis task with fully data science lifecyle. Predictive modelling with scikit-learn (sklearn) and low-code library PyCaret.

License

Notifications You must be signed in to change notification settings

dpedwards/ds-data-analysis-task

Repository files navigation

Data Analysis Task

Data analysis task with fully data science lifecyle

Author: Davain Edwards

data_science_cycle

Please do all the notebooks in order. The only exception is notebook 2, which can be skipped.

Using datasets known.csv and unknown.csv, [notebooks](

1_business_understanding.ipynb,
2_data_mining.ipynb, <-- Can be skipped!
3_data_cleaning.ipynb,
4_data_exploration.ipynb,
5_feature_engineering.ipynb,
6_predictive_modelling_with_pycaret.ipynb, <-- Final model training, test evaluation and model saving!
6_predictive_modelling_with_sklearn.ipynb, <!-- Testing and error analysis
7_data_visualization.ipynb) 

Shows a walk through all the steps of the Data Science Life Cycle. It thus contains:

    1. Business Understanding
    1. Data Mining
    1. Data Cleaning
    1. Exploratory Data Analysis
    1. Feature Engineering
    1. Predictive Modeling with Hyperparameter Tuning (Small Error Analysis)
    1. Data Visualization

Requirements

  • pyenv
  • python==3.8.5

Setup

For this purpose you use following commands:

python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Clean Notebooks

jupyter nbconvert --clear-output --inplace [NOTEBOOK.ipynb]

About

Regression data analysis task with fully data science lifecyle. Predictive modelling with scikit-learn (sklearn) and low-code library PyCaret.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published