This repository is associated with the paper titled "PP-GWAS: Privacy Preserving Multi-Site Genome-wide Association Studies". It provides all the necessary code to reproduce the experiments described in the paper, including synthetic dataset generation, dataset loading, distributed computations, and experiment results.
-
Code/
PP-GWAS implementation for HPC clusters (SLURM-ready).
Default outputs are written to test_site/. -
Data_Generation/
Scripts for simulating synthetic genotype/phenotype data withpysnptools.
Generated datasets are saved to test_site/ by default. -
REGENIE/
Instructions to restructure synthetic data into REGENIE-supported formats and run REGENIE. -
Results/
.txtoutputs from the experiments reported in the paper. -
local_run/
Self-contained setup (Jupyter notebook + simple GUI) for running PP-GWAS locally on small datasets.
-
Prepare data and run PP-GWAS locally (small data): use local_run/.
-
Prepare data and run PP-GWAS on a cluster: use Code/. Outputs go to test_site/.
-
Run REGENIE: go to REGENIE/ and follow the steps there.
-
Prepare data: use Data_Generation/; outputs appear in test_site/.
Please consider citing our work if it is beneficial to your research.
- Paper
@article{swaminathan2024pp,
title={PP-GWAS: Privacy Preserving Multi-Site Genome-wide Association Studies},
author={Swaminathan, Arjhun and Hannemann, Anika and {\"U}nal, Ali Burak and Pfeifer, Nico and Akg{\"u}n, Mete},
journal={arXiv preprint arXiv:2410.08122},
year={2024}
}- Code
@software{swaminathan2025code,
author = {Swaminathan, Arjhun and Hannemann, Anika and {\"U}nal, Ali Burak and Pfeifer, Nico and Akg{\"u}n, Mete},
title = {PP-GWAS: Privacy Preserving Multi-Site Genome-wide Association Studies — code},
version = {v1.0},
publisher= {Zenodo},
year = {2025},
doi = {10.5281/zenodo.17580283},
url = {https://doi.org/10.5281/zenodo.17580283}
}This project is released under the MIT License. See LICENSE for details.