Simple Sum 2 is a summary statistics-based colocalization tool that extends the frequently implemented Simple Sum method (Gong et al, 2019) to a flexible procedure (SS2) that integrates GWAS summary statistics with eQTL summary statistics across any number of gene-by-tissue pairs, is applicable when there are overlapping participants in the two studies and can be applied to GWAS summary statistics computed through meta-analysis, even with related individuals.
-
The sampledata folder provides all the sample datasets for conducting the Simple Sum 2 colocalization analysis, including the GWAS and eQTL files for two genes: MUC20 and MUC4 in tissue human nasal epithelial.
-
SS2_functions.R contains all the R functions for conducting the SS2 test which has been implemented in a web-based tool: LocusFocus, with an additional R option for using the chi-square statsitics as eQTL evidence measure.
-
Commands to run Simple Sum 2 for one GWAS/eQTL combination:
-
Obtain the GWAS p-values from the GWAS summary statistics file: gwas_file=read.table(file='sampledata/GWAS_file_gene_MUC4_tissue_Human_nasal_epithelial.txt )',header=T) gwas_pvalue=gwas_file$ALL.FIXED.PVAL
-
Obtain the eQTL p-values from the eQTL summary statistics file: eQTL_file=read.table(file='sampledata/eqtl_file_gene_MUC4_tissue_Human_nasal_epithelial.txt',header=T) eqtl_pvalue=eqtl_file$pval
-
Obtain the covariance matrix for the GWAS and eQTL summary statistics, respectively: cov_GWAS=as.matrix(read.table(file='sampledata/ldGWAS_file_gene_MUC4_tissue_Human_nasal_epithelial.txt',header=F)) cov_eQTL=as.matrix(read.table(file='sampledata/ldeqtl_file_gene_MUC4_tissue_Human_nasal_epithelial.txt',header=F))
-
Run SS2 colocalization analysis: stage1(eqtl_pvalue,cov_eQTL) produces the p-value for the stage 1 test. stage2(eqtl_pvalue,gwas_pvalue,cov_GWAS,use_evidence='log10p') produces the p-value for the stage 2 test by using -log10(eQTL p) as eQTL evidence measure. The p-value calculated by using the chi-square statistics as eQTL evidence measure can be done by specifying use_evidence='Tsquare'.
-
-
SS2_sampledataset_code.R implements the above commands by calling functions in SS2_functions.R.
-
ld_function.R provides the R functions for calculating the covariance matrix of Z-scores obtained from meta-analysis with either independent or related samples.
-
R function getcov_Zmeta produces the covariance of Z-scores from a meta-analysis according to the equation:
: the Z-score for SNP
obtained from a meta-analysis with
sub-studies.
: the Z-score for SNP
from study
.
: the weight for SNP
from study
.
: the total number of SNPs at the locus.
-
Input variables for getcov_Zmeta:
- weightmatrix: a
matrix, where the
th element is the weight
. An example is provided in the file weightmatrix.txt
- var_subgroup: a list with
elements, where each element is the covariance matrix of Z-scores for a particular study. An example is provided in the file 13subgroup_ld.Rdata.
- weightmatrix: a
-
R function get_subgroup_var calculates the covariance of Z-scores for studies that contain related individuals, according to equation:
-
Input variables for get_subgroup_var:
-
Examples for using R function get_subgroup_var and get_subgroup_var are provided in the Rscript ld_function.R.
-