Skip to content

EuracBiomedicalResearch/finemap_pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Finemapping pipeline

This is meant to collect a set of script needed to run finemapping on genetic association studies. We use snakemake workflow managment system based on python language. For additional details, refers to the homepage of Snakemake

In particular, we use the recently developed algorithm susie with its R implementation susieR, for further details on the algorithm, please refer to the original paper Wang et al. 2020

Overview

The pipeline starts from the summary statistic generated by the regenie algorithm, it applies a clumping step based on parameters defined in the configuration file. Afterwards, the clumps are enlarge to have a minimum size of 1Mb and if any overlaps between clumps is found, the two regions are merged together. For each clump, the susieR algorithm is applied. If a credible set is found, then it will be reported in the summary file.

Input:

  • Summary statistic files: pheno.regenie.gz.
  • Phenotype file (needed) containing the original phenotype used for the GWAS.
  • Genotype plink file sets [.bim, .fam, .bed] matching the GWAS analysis.

Output:

Installation

Requirements

- snakemake=8.4.8
- snakemake-executor-plugin-slurm
- git

Optional if already installed by the system administrator or already available in a conda environment.

See Install snakemake for further information and specific parameters.

conda create -n snakemake bioconda::snakemake bioconda::snakemake-executor-plugin-slurm

Pipeline installation

  1. Now clone this repo into your working directory.
git clone https://github.com/EuracBiomedicalResearch/finemap_pipeline
cd finemap_pipeline
  1. Write a configuration file

All the available parameters are defined through a configuration file written in YAML format language. Take the file config/config.yaml as an example and modify it according to your needs.

Running the pipeline

  1. Activate the conda environment
conda activate snakemake
  1. Dry-run to see the number of jobs to be submitted
sbatch snakemake --configfile config/config.yaml -n
  1. Submit the command to slurm

NB See the snakemake documentation on how to create a slurm profile to submit jobs.

  1. Snakemake documentation
  2. Snakemake profiles
sbatch snakemake --configfile config/config.yaml --profile ~/snake_prof/slurm 
  --executor slurm
  --latency-wait 60
  --nolock

Output

The pipeline produce a summary tsv file with the leading variant for each credible set found in the analysis. The summary contain a subset of the original summary statistic.

References

{#ref-susier} Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. (2020). A simple new approach to variable selection in regression, with application to genetic fine mapping. Journal of the Royal Statistical Society, Series B 82, 1273–1300. https://doi.org/10.1111/rssb.12388