Skip to content

Commit

Permalink
🐎 tweaked mem params
Browse files Browse the repository at this point in the history
📖 updated readme and docs, added antisipated release info
  • Loading branch information
migbro committed Jun 18, 2020
1 parent 5dcd21a commit c8f392f
Show file tree
Hide file tree
Showing 7 changed files with 24 additions and 24 deletions.
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ For all workflows, input bams should be indexed beforehand. This tool is provid
The overall [workflow](https://gatk.broadinstitute.org/hc/en-us/articles/360035531192-RNAseq-short-variant-discovery-SNPs-Indels-) picks up from post-STAR alignment, starting at Picard mark duplicates.
For the most part, tool parameters follow defaults from the GATK Best Practices [WDL](https://github.com/gatk-workflows/gatk4-rnaseq-germline-snps-indels/blob/master/gatk4-rna-best-practices.wdl), written in cwl with added optimization for use on the Cavatica platform.
`workflows/d3b_gatk_rnaseq_snv_wf.cwl` is the wrapper cwl used to run all tools for GATK4.
Run time (n=1) ~5 hours, cost on cavatica ~$5
Run time (n=1) ~12 hours, cost on cavatica ~$5

### Inputs
```yaml
Expand Down Expand Up @@ -42,7 +42,6 @@ outputs:
### Docker Pulls
- `kfdrc/sambamba:0.7.1`
- `kfdrc/gatk:4.1.7.0R`
- `kfdrc/python:2.7.13`

### Workflow Diagram

Expand Down
8 changes: 4 additions & 4 deletions notebooks/RNAseq_SNV_WF_DEV.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@
"outputs": [],
"source": [
"project = \"d3b-bixu/dev-rnaseq-snv\"\n",
"task_id = \"8cfb180d-f9f4-4268-bea4-7581bd9cb05e\"\n",
"out_file = open(\"/Users/brownm28/Documents/2020-Apr-8_RNAseq_snv_dev/gatk4.tsv\", \"w\")\n",
"task_id = \"8b37cb5b-ef56-4f3b-a7c9-87a89b07171f\"\n",
"out_file = open(\"/Users/brownm28/Documents/2020-Apr-8_RNAseq_snv_dev/2020-06-16_gatk4.tsv\", \"w\")\n",
"# task_id = \"3c20cc8e-18d7-43f2-bc2c-4a76d38a88f8\"\n",
"task = api.tasks.get(task_id)\n",
"jobs = {}\n",
Expand Down Expand Up @@ -90,8 +90,8 @@
"\n",
"# max desired col width\n",
"max_w = 200\n",
"tsv_in = open(\"/Users/brownm28/Documents/2020-Apr-8_RNAseq_snv_dev/gatk4.tsv\")\n",
"out_md = open(\"/Users/brownm28/Documents/2020-Apr-8_RNAseq_snv_dev/gatk4.md\", \"w\")\n",
"tsv_in = open(\"/Users/brownm28/Documents/2020-Apr-8_RNAseq_snv_dev/2020-06-16_gatk4.tsv\")\n",
"out_md = open(\"/Users/brownm28/Documents/2020-Apr-8_RNAseq_snv_dev/2020-06-16_gatk4.md\", \"w\")\n",
"data = []\n",
"max_len = []\n",
"\n",
Expand Down
5 changes: 3 additions & 2 deletions tools/gatk_applybqsr.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,15 @@ requirements:
- class: DockerRequirement
dockerPull: 'kfdrc/gatk:4.1.7.0R'
- class: ResourceRequirement
ramMin: 4000
coresMin: 2
ramMin: 8000
coresMin: 8
baseCommand: [/gatk, ApplyBQSR]
arguments:
- position: 1
shellQuote: false
valueFrom: >-
--java-options "-Xms3000m
-Xmx7500m
-XX:+PrintFlagsFinal
-XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps
Expand Down
4 changes: 2 additions & 2 deletions tools/gatk_splitncigarreads.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ requirements:
- class: ShellCommandRequirement
- class: InlineJavascriptRequirement
- class: ResourceRequirement
ramMin: 16000
ramMin: 32000
coresMin: 8
- class: DockerRequirement
dockerPull: 'kfdrc/gatk:4.1.7.0R'
Expand All @@ -14,7 +14,7 @@ arguments:
- position: 1
shellQuote: false
valueFrom: >-
--java-options "-Xmx16G
--java-options "-Xmx30G
-XX:+PrintFlagsFinal
-Xloggc:gc_log.log
-XX:GCTimeLimit=50
Expand Down
24 changes: 12 additions & 12 deletions workflows/d3b_gatk_rnaseq_snv_wf.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -5,38 +5,38 @@ doc: |-

The overall [workflow](https://gatk.broadinstitute.org/hc/en-us/articles/360035531192-RNAseq-short-variant-discovery-SNPs-Indels-) picks up from post-STAR alignment, starting at Picard mark duplicates.
For the most part, tool parameters follow defaults from the GATK Best Practices [WDL](https://github.com/gatk-workflows/gatk4-rnaseq-germline-snps-indels/blob/master/gatk4-rna-best-practices.wdl), written in cwl with added optimatization for use on the Cavatica platform.
The git repo serving this app and related tools can be found [here](https://github.com/d3b-center/d3b-dev-rnaseq-snv).
`workflows/d3b_gatk_rnaseq_snv_wf.cwl` is the wrapper cwl used to run all tools for GAT4.
The git repo serving this app and related tools can be found [here](https://github.com/d3b-center/d3b-dev-rnaseq-snv/releases/tag/0.3).
`workflows/d3b_gatk_rnaseq_snv_wf.cwl` is the wrapper cwl used to run all tools for GATK4.

### Inputs
```yaml
inputs:
output_basename: string
pass_thru: {type: boolean, doc: "Param for whether to skip name sort step before markd dup if source is already name sorted", default: false}
scatter_ct: {type: int?, doc: "Number of interval lists to split into", default: 50}
STAR_sorted_genomic_bam: {type: File, doc: "STAR sorted alignment bam", secondaryFiles: ['^.bai']}
sample_name: string
reference_fasta: {type: File, secondaryFiles: ['.fai', '^.dict'], doc: "Reference genome used"}
reference_fasta: {type: File, secondaryFiles: ['^.dict', '.fai'], doc: "Reference genome used"}
reference_dict: File
vardict_min_vaf: {type: ['null', float], doc: "Min variant allele frequency for vardict to consider. Recommend 0.2", default: 0.2}
vardict_cpus: {type: ['null', int], default: 4}
vardict_ram: {type: ['null', int], default: 8, doc: "In GB"}
call_bed_file: {type: File, doc: "BED or GTF intervals to make calls"}
exome_flag: {type: string?, default: "Y", doc: "Whether to run in exome mode for callers. Should be Y or leave blank as default is Y. Only make N if you are certain"}
knownsites: {type: 'File[]', doc: "Population vcfs, based on Broad best practices"}
dbsnp_vcf: {type: File, secondaryFiles: ['.idx']}
tool_name: {type: string, doc: "description of tool that generated data, i.e. gatk_haplotypecaller"}
padding: {type: ['null', int], doc: "Padding to add to input intervals, recommend 0 if intervals already padded, 150 if not", default: 150}
mode: {type: ['null', {type: enum, name: select_vars_mode, symbols: ["gatk", "grep"]}], doc: "Choose 'gatk' for SelectVariants tool, or 'grep' for grep expression", default: "gatk"}
```

### Outputs
```yaml
outputs:
haplotype_called__vcf: {type: File, outputSource: merge_hc_vcf/merged_vcf, doc: "Haplotype Caller called vcf, after genotyping"}
filtered_vcf: {type: File, outputSource: gatk_filter_vcf/filtered_vcf, doc: "Called vcf after Broad-recommended hard filters applied"}
filtered_hc_vcf: {type: File, outputSource: gatk_filter_vcf/filtered_vcf, doc: "Haplotype called vcf with Broad-recommended FILTER values added"}
pass_vcf: {type: File, outputSource: gatk_pass_vcf/pass_vcf, doc: "Filtered vcf selected for PASS variants"}
anaylsis_ready_bam: {type: File, outputSource: gatk_applybqsr/recalibrated_bam, doc: "Duplicate marked, Split N trimmed CIGAR BAM, BQSR recalibratede, ready for RNAseq calling"}
bqsr_table: {type: File, outputSource: gatk_baserecalibrator/output, doc: "BQSR table"}
```

### Docker Pulls
- `kfdrc/sambamba:0.7.1`
- `kfdrc/gatk:4.1.1.0`
- `kfdrc/python:2.7.13`
- `kfdrc/gatk:4.1.7.0R`

### Simulated bash calls
An example of bash calls from each step can be found in the [git repo](https://github.com/d3b-center/d3b-dev-rnaseq-snv#gatk4-simulated-bash-calls)
Expand Down
2 changes: 1 addition & 1 deletion workflows/d3b_strelka2_rnaseq_snv_wf.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ doc: |-
Strelka2 SNV Calling Workflow

This [workflow](https://github.com/Illumina/strelka/blob/v2.9.x/docs/userGuide/README.md#rna-seq) is pretty straight forward, with a `PASS` filter step added to get `PASS` calls.
The git repo serving this app and related tools can be found [here](https://github.com/d3b-center/d3b-dev-rnaseq-snv).
The git repo serving this app and related tools can be found [here](https://github.com/d3b-center/d3b-dev-rnaseq-snv), compatible with all releases as of 2020-Jun-17.
`workflows/d3b_strelka2_rnaseq_snv_wf.cwl` is the wrapper cwl that runs this workflow

### Inputs
Expand Down
2 changes: 1 addition & 1 deletion workflows/d3b_vardict_rnaseq_snv_wf.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ class: Workflow
doc: |-
VarDict Java RNAseq SNV Calling Workflow

This [workflow](https://github.com/bcbio/bcbio-nextgen/blob/master/bcbio/rnaseq/variation.py) is based on the Vardict run style of BC Bio.
This [workflow](https://github.com/bcbio/bcbio-nextgen/blob/master/bcbio/rnaseq/variation.py) is based on the Vardict run style of BC Bio, compatible with all releases as of 2020-Jun-17.
`workflows/d3b_vardict_rnaseq_snv_wf.cwl` is the wrapper cwl that runs this workflow.
Tweaking `vardict_bp_target` and `vardict_intvl_target_size` maybe be needed to improve run time in high coverage areas, by reducing their values from defaults.

Expand Down

0 comments on commit c8f392f

Please sign in to comment.