- Preprocessing functions to read in CSV data and manipulate it
- Functions that return a number of essential metrics on model performance
- Generates visualizations to better understand the model
To use these model selection tools, you'll need to:
-
Clone this repository:
$ git clone https://github.com/mmoderwell/model_selection.git $ cd model_selection
-
Copy
analysis.py
to your project directory, install packages:$ cp analysis.py ../path/to/project $ pip3 install numpy matplotlib matplotlib_venn seaborn
## Using the functions
import analysis
analysis.performance(estimated, actual, visualize=True, verbose=True):
Arguments: estimated: array of estimated output probabilities, actual: array of actual output classifications Optional: visualize (Bool), verbose (Bool)
Returns: returns accuracy, optionally prints other metrics and a performance visualization
import analysis
analysis.distribution_metric(estimated, actual, precision=2, visualize=True, verbose=True):
Arguments: estimated: array of estimated output probabilities, actual: array of actual output classifications Optional: precision (int), visualize (Bool), verbose (Bool)
Returns: prints the calculated percentage of predictions outside of 1 standard deviation from the mean number of predictions at each probability, optionally draws a visualization
Within /notebooks
, you can try out these functions by running the analysis notebook.
However, you should first install the extra pacakges.
$ pip install -r requirements.txt
- Matt Moderwell - Initial work - mmoderwell.com
Also see the list of contributors who participated in this project.