Skip to content

AliMMehr/logistic-regression-pm

Repository files navigation

Logistic Regression Plus-Minus

This repository contains the codes for David Poole, Ali Mohammad Mehr, Wan Shing Martin Wang ,"Conditioning on “and nothing else”: Simple Models of Missing Data betweenNaive Bayes and Logistic Regression," ICML 2020 Workshop Artemiss Submission.

Running the code

  1. pip install -r requirements.txt

  2. ./run_tests.sh 1 200 10

Explanations for above commands

pip install -r requirements.txt will install Python dependencies for the code. The main dependencies include sklearn and pymc3.

Running ./run_tests.sh 1 200 10 will run the tests and produce the output graphs. As you can see, this script accepts 3 arguments as follows:

  • start_id: The id of the first test. Suggested value: 1

  • end_id: The id of the last test. Suggested value: 200

  • Number of concurrent tests running. Suggested value: 10 (if your system has at least 10 CPU threads)

The script will run $$(end_id-start_id+1)$$ tests. Each test is composed of creating 3 randomly generated datasets - namely a C dataset, a D4 dataset, and a DLR dataset. On each generated dataset, the code traines the four different models mentioned in our paper - namely models LR$$\pm$$, Naive Bayes, model (c), and model (d). For each generated dataset, the logloss of each model for that dataset is stored in an individual json file in results/ directory.

In the end, the script will run python Read-test-results-and-plot-graphs.py to make the graphs shown in paper alongs with printing average logloss comparisons.

Using the suggested argument values for ./run_tests.sh will result in running 200 tests which are run simultaneously on 10 CPU threads. Note that it will take more than 6 hours for the code to run 200 tests simultaneously on 10 CPU threads.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published