Table of Contents
-
A common frustration in the industry, especially when it comes to getting business insights from tabular data, is that the most interesting questions (from their perspective) are often not answerable with observational data alone. These questions can be similar to:
- “What will happen if I halve the price of my product?”
- “Which clients will pay their debts only if I call them?”
-
The causal graph is a central object in the framework mentioned above, but it is often unknown, subject to personal knowledge and bias, or loosely connected to the available data. The main objective of this task is to highlight the importance of the matter in a concrete way.
-
In this project we applied a casual graph model on a breast cancer dataset using a library called CausalNex in order to learn about the cause and effect structure behind the diagnosis of breast cancer.
The repository has a number of files including python scripts, jupyter notebooks, pdfs and text files. Here is their structure with a brief explanation.
- We are using the Breast Cancer Wisconsin (Diagnostic) Data Set extracted from kaggle
- EDA.ipynb: a jupyter notebook for exploratory data analysis
-
- a python script for logging
-
- a python script for handling reading and writing of csv, pickle and other files
-
- class for exploring the data
-
- dataframe cleaner helper functions
-
- Dataframe Outlier class
-
- update the code per request
-
- a class for Exploratory Data Analysis
-
- a class for reading and saving datafram.
-
- a collection of methods for ploting a graph
-
- a collection of methods for ploting a graph
- the folder containing unit tests for components in the scripts
- the folder containing log files (if it doesn't exist it will be created once logging starts)