- Place the training data in data/Challenge, and test data in data/Evaluation
- Run src/IO.py, and get csv files that will be used for the next steps.
- Run src/model-size.py to get the sizes of models with various parameters.
- Run src/table-maker.py, and get the estimated performance of each model under various parameters.
- Select the best pair of parameter by using the tables from 3 and 4.
- Run src/method.py to get the final model.
Read training data in data/Challenge and test data in data/Evaluation, and generate the csv files for SNPeff. To be more precise, this script outputs three csv files:
- out/SNPeff/SNPeff_train.csv
- out/SNPeff/SNPeff_test.csv
- out/SNPeff/variant_gene_list.csv
python3 src/IO.py
For given training data in data/Challenge, test data in data/Evaluation, csv files in out/SNPeff, and (d_cn, k_var) from argv, output the shallow network models trained by our method.
python3 src/method.py --cn 0.1 --snpeff 260 --repeat 10 --path out/model/
For given training data in data/Challenge, csv files in out/SNPeff, and n and a range of (d_cn, k_var) from argv, output a table that describes the performance of our method using n-fold cross validation under each (d_vn, k_var). More precisely, this script outputs csv files:
- [path]/exact_aucs.csv
- [path]/exact_accs.csv
- [path]/approx_aucs.csv
- [path]/approx_accs.csv
python3 src/table-maker.py --cn-start 0 --cn-step 0.01 --cn-stop 0.4 --snpeff-start 0 --snpeff-step 10 --snpeff-stop 600 --repeat 10 --path out/table/
For given training data in data/Challenge, csv files in out/SNPeff, and a range of (d_cn, k_var) from argv, output the table of the size of models under each (d_cn, k_var).
python3 src/model-size.py --cn-start 0 --cn-step 0.01 --cn-stop 0.4 --snpeff-start 0 --snpeff-step 10 --snpeff-stop 600 --path out/modelTMP/num_cands.csv