runExomeDepth

This repo contains code to help call CNVs from WES, or targeted sequencing, data using ExomeDepth. The main script ExomeDepth.R allows multiple samples to be analysed at the same time with a set of baseline samples.

Getting Started

Input

To run this analysis you will need the following input:

a set of BAM files for which to call CNVs - one sample per BAM file
a set of BAM files to use as the baseline - one sample per BAM file
indexed BAM files (.bai) for all the above BAM files
a BED file of the target region of your exome capture or targeted sequencing data. If this is not supplied hg19 will be used.
an annotation file (GTF/GFF). This needs to match the build of your targets. If this is not supplied ensembl version 71 (hg19) will be used.

It is advisable to do the indexing of the BAM files prior to running the pipeline. If the dates of the index files are older than the BAM files, ExomeDepth will throw an error. Indexing of BAM files can be done by:

samtools index input.bam # for a single sample or;
samtools index -M *.bam # multiple samples

Depedencies

miniconda
The rest of the dependencies are installed via conda through the environment.yml file

Installation

Clone the directory. From command line simply run the following command from the directory where you wish to install the repo:

git clone --recursive https://github.com/egustavsson/runExomeDepth.git

Analysis steps

1. Create the conda environment

First you neeed to create the conda environment which will install all the dependencies. This step only needs to be done once:

cd runExomeDepth
conda env create -f environment.yml

2. Activate the conda environment

After the conda environment has been created it needs to be activated prior to running the analysis. Therefore, make sure to first activate the conda environment using the command conda activate runExomeDepth.

It can be done like this:

cd runExomeDepth
conda activate runExomeDepth

After you are done with the analysis you can deactivate the conda environment by:

conda deactivate

3. Install ExomeDepth and required R packages

While R and R-essentials are installed throught the conda environment, other required R packages, including ExomeDepth are installed by running the script install-packages.R.

From the ./runExomDepth directory, the script can be ran like this:

Rscript install-packages.R

Installing R packages using this script only needs to be done once and are saved within the conda environment.

2. Call CNVs with ExomeDepth

Input data

The main script to call CNVs with is called ExomeDepth.R. Make sure you have the following input data prior to running it:

Parameter	Description
`--targets`	bed file with exon targets. This is optional and if none is given hg19 will be used
`--annotation`	GTF/GFF file with gene coordinates. This is optional and if none is given ensembl version 71 (hg19) will be used
`--test-samples`	TSV file with paths to the BAM files to call CNVs for, one per line
`--baseline-samples`	TSV file with paths to the BAM files used for the baseline, one per line
`--output-directory`	path to output directory

Example of --test-samples and --baseline-samples required TSV files looks like this:

/path/to/test_sample1.bam
/path/to/test_sample2.bam
/path/to/test_sample3.bam

Example files can also be found here test_samples.tsv and here baseline_samples.tsv

Run the script

Once you have the required input data, follow these steps to run the ExomeDepth.R script:

Rscript ExomeDepth.R \
        --targets /path/to/targets.bed \
        --annotation /path/to/annotation.gff \
        --test-samples test_samples.tsv \
        --baseline-samples baseline_samples.tsv \
        --output-directory /path/to/output_folder/

Output

The output is a CSV file with CNVs per sample and a log file. The CNVs are sorted in descending order based on the BF column, which stands for Bayes factor. It quantifes the statistical support for each CNV. It is in fact the log10 of the likelihood ratio of data for the CNV call divided by the null (normal copy number). The higher that number, the more confdent one can be about the presence of a CNV. All CNVs will also have a column with the overlapping gene names added.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
results		results
.gitignore		.gitignore
ExomeDepth.R		ExomeDepth.R
LICENSE		LICENSE
README.md		README.md
baseline-samples.tsv		baseline-samples.tsv
environment.yml		environment.yml
install-packages.R		install-packages.R
test_samples.tsv		test_samples.tsv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

runExomeDepth

Getting Started

Input

Depedencies

Installation

Analysis steps

1. Create the conda environment

2. Activate the conda environment

3. Install ExomeDepth and required R packages

2. Call CNVs with ExomeDepth

Input data

Run the script

Output

About

Releases

Packages

Languages

License

egustavsson/runExomeDepth

Folders and files

Latest commit

History

Repository files navigation

runExomeDepth

Getting Started

Input

Depedencies

Installation

Analysis steps

1. Create the conda environment

2. Activate the conda environment

3. Install ExomeDepth and required R packages

2. Call CNVs with ExomeDepth

Input data

Run the script

Output

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages