Skip to content

This repo contains code to call CNVs from WES data using ExomeDepth

License

Notifications You must be signed in to change notification settings

egustavsson/runExomeDepth

Repository files navigation

runExomeDepth

This repo contains code to help call CNVs from WES, or targeted sequencing, data using ExomeDepth. The main script ExomeDepth.R allows multiple samples to be analysed at the same time with a set of baseline samples.

Getting Started

Input

To run this analysis you will need the following input:

  • a set of BAM files for which to call CNVs - one sample per BAM file
  • a set of BAM files to use as the baseline - one sample per BAM file
  • indexed BAM files (.bai) for all the above BAM files
  • a BED file of the target region of your exome capture or targeted sequencing data. If this is not supplied hg19 will be used.
  • an annotation file (GTF/GFF). This needs to match the build of your targets. If this is not supplied ensembl version 71 (hg19) will be used.

It is advisable to do the indexing of the BAM files prior to running the pipeline. If the dates of the index files are older than the BAM files, ExomeDepth will throw an error. Indexing of BAM files can be done by:

samtools index input.bam # for a single sample or;
samtools index -M *.bam # multiple samples

Depedencies

  • miniconda
  • The rest of the dependencies are installed via conda through the environment.yml file

Installation

Clone the directory. From command line simply run the following command from the directory where you wish to install the repo:

git clone --recursive https://github.com/egustavsson/runExomeDepth.git

Analysis steps

1. Create the conda environment

First you neeed to create the conda environment which will install all the dependencies. This step only needs to be done once:

cd runExomeDepth
conda env create -f environment.yml

2. Activate the conda environment

After the conda environment has been created it needs to be activated prior to running the analysis. Therefore, make sure to first activate the conda environment using the command conda activate runExomeDepth.

It can be done like this:

cd runExomeDepth
conda activate runExomeDepth

After you are done with the analysis you can deactivate the conda environment by:

conda deactivate

3. Install ExomeDepth and required R packages

While R and R-essentials are installed throught the conda environment, other required R packages, including ExomeDepth are installed by running the script install-packages.R.

From the ./runExomDepth directory, the script can be ran like this:

Rscript install-packages.R

Installing R packages using this script only needs to be done once and are saved within the conda environment.

2. Call CNVs with ExomeDepth

Input data

The main script to call CNVs with is called ExomeDepth.R. Make sure you have the following input data prior to running it:

Parameter Description
--targets bed file with exon targets. This is optional and if none is given hg19 will be used
--annotation GTF/GFF file with gene coordinates. This is optional and if none is given ensembl version 71 (hg19) will be used
--test-samples TSV file with paths to the BAM files to call CNVs for, one per line
--baseline-samples TSV file with paths to the BAM files used for the baseline, one per line
--output-directory path to output directory

Example of --test-samples and --baseline-samples required TSV files looks like this:

/path/to/test_sample1.bam
/path/to/test_sample2.bam
/path/to/test_sample3.bam

Example files can also be found here test_samples.tsv and here baseline_samples.tsv

Run the script

Once you have the required input data, follow these steps to run the ExomeDepth.R script:

Rscript ExomeDepth.R \
        --targets /path/to/targets.bed \
        --annotation /path/to/annotation.gff \
        --test-samples test_samples.tsv \
        --baseline-samples baseline_samples.tsv \
        --output-directory /path/to/output_folder/

Output

The output is a CSV file with CNVs per sample and a log file. The CNVs are sorted in descending order based on the BF column, which stands for Bayes factor. It quantifes the statistical support for each CNV. It is in fact the log10 of the likelihood ratio of data for the CNV call divided by the null (normal copy number). The higher that number, the more confdent one can be about the presence of a CNV. All CNVs will also have a column with the overlapping gene names added.

About

This repo contains code to call CNVs from WES data using ExomeDepth

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages