Skip to content

GWAS data

Jonathan Hess edited this page Jul 21, 2017 · 10 revisions

What data does FLEET accept?

It is expected that you will be analyzing a file of genome-wide association study (GWAS) summary statistics. This file should have at least two columns:

  1. Single nucleotide polymorphism (SNP) matching 1000 Genome rsIDs (--snp-field)
  2. P-value (---clump-field)

FLEET will automatically run Plink's --clump algorithm across chromosomes 1-22. It will use the following parameters by default.

  1. --clump-p1 1.0
  2. --clump-p2 1.0
  3. --clump-r2 0.1
  4. --clump-kb 1000

Reference data from the 1000G will be automatically pre-pruned for variants in high LD using the --indep algorithm from Plink:

  1. --maf .05 --indep 100 5 2

Note that CEUqc_1kg_phase1_* reference data were originally downloaded from the Plink resources page. I extracted samples corresponding to the CEU population, and implemented the following quality control measures:

  1. --filter-founders
  2. --maf 0.01
  3. --geno 0.05
  4. --mind 0.05
  5. --hwe 1e-06

Example GWAS data

https://www.med.unc.edu/pgc/results-and-downloads

Clone this wiki locally