Skip to content

FanWang0216/SimpleSum2Colocalization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple Sum 2

Simple Sum 2 is a summary statistics-based colocalization tool that extends the frequently implemented Simple Sum method (Gong et al, 2019) to a flexible procedure (SS2) that integrates GWAS summary statistics with eQTL summary statistics across any number of gene-by-tissue pairs, is applicable when there are overlapping participants in the two studies and can be applied to GWAS summary statistics computed through meta-analysis, even with related individuals.

  • The sampledata folder provides all the sample datasets for conducting the Simple Sum 2 colocalization analysis, including the GWAS and eQTL files for two genes: MUC20 and MUC4 in tissue human nasal epithelial.

  • SS2_functions.R contains all the R functions for conducting the SS2 test which has been implemented in a web-based tool: LocusFocus, with an additional R option for using the chi-square statsitics as eQTL evidence measure.

  • Commands to run Simple Sum 2 for one GWAS/eQTL combination:

    1. Obtain the GWAS p-values from the GWAS summary statistics file: gwas_file=read.table(file='sampledata/GWAS_file_gene_MUC4_tissue_Human_nasal_epithelial.txt )',header=T) gwas_pvalue=gwas_file$ALL.FIXED.PVAL

    2. Obtain the eQTL p-values from the eQTL summary statistics file: eQTL_file=read.table(file='sampledata/eqtl_file_gene_MUC4_tissue_Human_nasal_epithelial.txt',header=T) eqtl_pvalue=eqtl_file$pval

    3. Obtain the covariance matrix for the GWAS and eQTL summary statistics, respectively: cov_GWAS=as.matrix(read.table(file='sampledata/ldGWAS_file_gene_MUC4_tissue_Human_nasal_epithelial.txt',header=F)) cov_eQTL=as.matrix(read.table(file='sampledata/ldeqtl_file_gene_MUC4_tissue_Human_nasal_epithelial.txt',header=F))

    4. Run SS2 colocalization analysis: stage1(eqtl_pvalue,cov_eQTL) produces the p-value for the stage 1 test. stage2(eqtl_pvalue,gwas_pvalue,cov_GWAS,use_evidence='log10p') produces the p-value for the stage 2 test by using -log10(eQTL p) as eQTL evidence measure. The p-value calculated by using the chi-square statistics as eQTL evidence measure can be done by specifying use_evidence='Tsquare'.

  • SS2_sampledataset_code.R implements the above commands by calling functions in SS2_functions.R.

  • ld_function.R provides the R functions for calculating the covariance matrix of Z-scores obtained from meta-analysis with either independent or related samples.

    • R function getcov_Zmeta produces the covariance of Z-scores from a meta-analysis according to the equation:

      \begin{aligned} \operatorname{cov}\left(Z_{m e t a, j}, Z_{m e t a, l}\right) &=\operatorname{cov}\left(\frac{\sum_{c=1}^{C} \sqrt{w_{c, j}} Z_{c, j}}{\sqrt{\sum_{c=1}^{C} w_{c, j}}}, \frac{\sum_{c=1}^{C} \sqrt{w_{c, l}} Z_{c, l}}{\sqrt{\sum_{c=1}^{C} w_{c, l}}}\right) \\ &=\frac{\sum_{c=1}^{C} \sqrt{w_{c, j}} \sqrt{w_{c, l}}}{\sqrt{\sum_{c=1}^{C} w_{c, j}} \sqrt{\sum_{c=1}^{C} w_{c, l}}} \operatorname{cov}\left(Z_{c, j}, Z_{c, l}\right); \end{aligned}

      Z_{meta,j}: the Z-score for SNP j obtained from a meta-analysis with C sub-studies. Z_{c,j}: the Z-score for SNP j from study c. w_{c,j}: the weight for SNP j from study c. m: the total number of SNPs at the locus.

    • Input variables for getcov_Zmeta:

      1. weightmatrix: a m\times n_{c} matrix, where the j,c th element is the weight w_{c,j}. An example is provided in the file weightmatrix.txt
      2. var_subgroup: a list with C elements, where each element is the covariance matrix of Z-scores for a particular study. An example is provided in the file 13subgroup_ld.Rdata.
    • R function get_subgroup_var calculates the covariance of Z-scores for studies that contain related individuals, according to equation:

      \begin{aligned} \operatorname{cov}\left(Z_{c, j}, Z_{c, l}\right)=\frac{G_{j}^{\top} P^{*} G_{l}}{\sqrt{G_{j}^{\top} P^{*} G_{j} \sqrt{G_{l}^{\top} P^{*} G_{l}}}}, \text { with } P^{*}=\Sigma^{-1}-\Sigma^{-1} X\left(X^{\top} \Sigma^{-1} X\right)^{-1} X^{\top} \Sigma^{-1}. \end{aligned}

    • Input variables for get_subgroup_var:

      1. G: a n_{c}\times m genotype matrix
      2. X: a n_{c}\times q matrix of covariates (i.e. sex and age), including the intercept.
      3. Sigma( \Sigma ): a n_{c}\times n_{c} matrix contains information for sample relatedness. When the Sigma is unknown, the user could obtain the estimated Sigma by using R package nlme or GMMAT.
    • Examples for using R function get_subgroup_var and get_subgroup_var are provided in the Rscript ld_function.R.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages