demodq

Directory Structure

Jupyter Notebooks:

(RQ1) detect_errors-combined investigates error types for each dataset broken down by sensitive attributes.
(RQ1) deep-dive-data-errors-mislabels is an exploration of potentially mislabeled samples in the raw datasets.
(RQ2) compute-result-table computes fairness metrics and statistical significance on the results in cleanml-results and converts the raw result data structure into the result table in cleanml.csv.
(RQ2) cleanml-analysis generates the table in our paper that describes total case counts with negative, insignificant, and positive impact on fairness and on accuracy. It also examines how many experimental conditions had non-negative impact on fairness for each dataset and error type.
(RQ2) cleanml-analysis-per-model groups all experiments by model type and error type and tallies the impact on fairness and on accuracy.
(RQ2) cleanml-analysis-cleaning-type counts cases with positive impact on fairness for each data cleaning method.
(RQ2) cleanml-accuracies identifies the best model types for each data error type with respect to model accuracy.

# Set up virtual env
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Start Jupyter server
jupyter notebook

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
cleanml-results		cleanml-results
data		data
demodq		demodq
.gitignore		.gitignore
README.md		README.md
cleanml-accuracies.ipynb		cleanml-accuracies.ipynb
cleanml-analysis-cleaning-type.ipynb		cleanml-analysis-cleaning-type.ipynb
cleanml-analysis-per-model.ipynb		cleanml-analysis-per-model.ipynb
cleanml-analysis.ipynb		cleanml-analysis.ipynb
cleanml.csv		cleanml.csv
compute-result-table.ipynb		compute-result-table.ipynb
deep-dive-data-errors-mislabels.ipynb		deep-dive-data-errors-mislabels.ipynb
detect_errors-combined.ipynb		detect_errors-combined.ipynb
requirements.txt		requirements.txt