This repository holds the source codes of my Master Thesis. The title of my thesis is "Determining Population Structure in Bioinformatics Problems by Bayesian Inference Methods". My supervisor was Prof. Seyed Mahmoud Taheri, and my advisor was Dr. Seyed Morteza Amini. I defended on 21 October 2020, in front of jury composed of:
- Prof. Seyed Mahmoud Taheri,
- Dr. Seyed Morteza Amini,
- Dr. Ali Kamandi,
- Dr. Firoozeh Rivaz (Shahid Beheshti University).
The Slides that I used is available at BayesianGWAS.pdf.
Association mapping of genetic diseases is an important topic in bioinformatics and human genome studies, which has attracted extensive research interest during the recent years. Most disease association mapping algorithms are based on hypothesis testing procedures that test one variant at a time. These methods lose efficiency when the disease mutations are jointly tagged by multiple variants, or when gene-gene interactions exist. In addition, most data sources which used for these methods are homogeneous, and they ignore population structure.
In this thesis, we describe a model-based clustering method for using multi-locus genotype data to infer population structure and assign individuals to sub-populations with variational Bayes. In this method detection of multi-variant joint association in genome-wide case-control studies performed by making the disease graph of each sub-population.
The proposed method has improved detection of disease related loci by utilizing population structure in determining epistatic interactions.
Keywords: Variational Bayes inference, Markov chain Monte Carlo, Association mapping in GWAS, Disease graph, SNP interactions
- Datasets: Synthetic and real datasets
- Docs: Presentation slides and some additional supplementary proofs
- Images: Plots that illustrates our results
- MySTRUCTURE: Final implementation of our proposed method
- Python: Some prototype implementations used for proof of concepts
- Tools: Utility tools used for generating gene pools and RNAseq mapping