Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List of studies that have used this data #2

Open
dhimmel opened this issue Jul 19, 2016 · 3 comments
Open

List of studies that have used this data #2

dhimmel opened this issue Jul 19, 2016 · 3 comments

Comments

@dhimmel
Copy link
Collaborator

dhimmel commented Jul 19, 2016

This goal of this issue is to compile a list of studies that have used our model free data.

@dhimmel
Copy link
Collaborator Author

dhimmel commented Jul 19, 2016

Learning Vector Quantization with Local Adaptive Weighting for Relevance Determination in Genome-Wide Association Studies
Flavia R B Araujo, Hansenclever F Bassani, Aluizio F R Araujo
Neural Networks (IJCNN) (2013) DOI: 10.1109/IJCNN.2013.6707040

To evaluate the scalability of DSEL-LVQ we considered the datasets described in [22] with interactions of three, four and five SNPs. However, these datasets included only the relevant SNPs, therefore, to produce datasets with 20, 50 and 100 SNPs, we selected randomly 800 and 1600 individuals from [22] and combined them with irrelevant SNPs (noisy data) randomly selected from [21]. This resulted in 100 data files for each combination of population sizes, numbers of interacting SNPs and number of irrelevant SNPs, all equally balanced in cases and to controls.

@dhimmel
Copy link
Collaborator Author

dhimmel commented Jul 19, 2016

Cuckoo search epistasis: a new method for exploring significant genetic interactions
M Aflakparast, H Salimi, A Gerami, M-P Dubé, S Visweswaran, A Masoudi-Nejad
Heredity (2014) DOI: 10.1038/hdy.2014.4

We also used Himmelstein data sets with three to five functional SNPs, which had been generated with no predefined genetic models, to evaluate methods in identifying higher order interactions. For any interaction order, the data folders consisted of 100 data sets each having 1500 cases and 1500 controls for a SNP number as high as the considered interaction order. Assuming Hardy-Weinberg equilibrium proportions and MAF of 0.5, we randomly generated additional SNP data to embed with the Himmenstein data using a multinomial distribution. After embedding Himmelstein data with our generated data sets, the resulting data sets for any interaction order contained 1000 SNPs for 3000 samples. These data sets are available online from http://discovery.dartmouth.edu/model_free_data/.

@dhimmel
Copy link
Collaborator Author

dhimmel commented Jul 19, 2016

CINOEDV: a co-information based method for detecting and visualizing n-order epistatic interactions
Junliang Shang, Yingxia Sun, Jin-Xing Liu, Junfeng Xia, Junying Zhang, and Chun-Hou Zheng
BMC Bioinformatics (2016) DOI: 10.1186/s12859-016-1076-8

For assessing the capability of CINOEDV in inferring higher order epistatic interactions from the epistasis hypergraph, four models are used that have been developed previously [49, 50], namely, Three − 1, Three − 2, Four and Five. Three − 1 is a model of 3-order epistatic interaction displaying both marginal effects and interaction effects. Three − 2 is a pure model of 3-order epistatic interaction, where the association to the phenotype is only observable when all 3 ground-truth SNPs are considered together, that is, no main effects and no pairwise epistatic interactions. Similarly, Four and Five are models of 4-order and 5-order epistatic interactions, each displaying no main effects and no 2-order interaction effects. For each corresponding data set also generated by epiSIM [44], 1500 cases and 1500 controls are included and genotyped by 1000 SNPs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant