Skip to content

Commit

Permalink
Add conda env for bowtie2 and update README
Browse files Browse the repository at this point in the history
  • Loading branch information
SherineAwad committed Feb 19, 2022
1 parent a35b952 commit 502d768
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 5 deletions.
12 changes: 7 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,21 +11,23 @@ This is a GATK variant calling snakemake pipeline written by Sherine Awad.


We are using GATK4 GVCF mode. To run the pipeline, edit the config file to match your samples file name and reference genome.
Your samples names should be listed by default in samples.tsv file. You can change this file name in config file if neededi by editing the SAMPLES entry in the config file.
Your samples names should be listed by default in samples.tsv file. You can change this file name in *config file* if neededi by editing the **SAMPLES** entry in the config file.

The pipeline expects samples with suffix ".r_1.fq.gz" and ".r_2.fq.gz" if samples are paired-end.
Any prefix before this suffix is the sample name and to be written in the "samples.tsv". For single-end reads, the samples suffix is ".fq.gz" and any prefix before this suffix is written in the "samples.tsv".
Any prefix before this suffix is the sample name and to be written in the "samples.tsv". For single-end reads, the samples suffix is ".fq.gz" and any prefix before this suffix is written in the **"samples.tsv"**.
For example, if your sample name is sample1.s_1.r_1.fq.gz, then your sample name in the samples file should be sample1.s_1.

You need to update the config file with whether your samples are paired-end or single reads. If your samples are paired-end, then the PAIRD entry in the config file should be set to TRUE, otherwise, set the PAIRED entry in the config file to FALSE.
You need to update the config file with whether your samples are paired-end or single reads. If your samples are paired-end, then the **PAIRD** entry in the config file should be set to TRUE, otherwise, set the **PAIRED** entry in the config file to FALSE. You can change the **samples.tsv** name in the config file.

You need to update your interval list, by editing the intervals.list file to list only the chromosomes of interest. You can change the name of this file by editing the config file entry INTERVALS.
You need to update your interval list, by editing the **intervals.list** file to list only the chromosomes of interest. You can change the name of this file by editing the config file entry INTERVALS.

The pipeline pulls automatically the resources needed by GATK from Broad Institute resource bundles.
The pipeline uses **Annovar** for annotations.


You can change the samples.tsv name in the config file.
We use hard filtering. But you can always pass the vcf VariantRecalibrator. You can change the hard filter parameters in the *config file*.

### Run the pipeline

snakemake -jn

Expand Down
4 changes: 4 additions & 0 deletions Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,8 @@ if config['PAIRED']:
r2 = "galore/{sample}.r_2_val_2.fq.gz"
params:
genome = config['GENOME']
benchmark: "logs/{sample}.bowtie2.benchmark"
conda: 'env/env-align.yaml'
output:
"{sample}.sam"
shell:
Expand Down Expand Up @@ -162,6 +164,8 @@ else:
"galore/{sample}_trimmed.fq.gz"
params:
genome = config['GENOME']
benchmark: "logs/{sample}.bowtie2.benchmark"
conda: 'env/env-align.yaml'
output:
"{sample}.sam"
shell:
Expand Down

0 comments on commit 502d768

Please sign in to comment.