-
Notifications
You must be signed in to change notification settings - Fork 18
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
Add missing uuid to config Fix bad unpacking Add sample URLs to logger Add suffix option to test Remove unused import Move google group line to parent README Add whitespace before map_job Add comment explaining map_job Fix missing trim attribute in test Clean up logic for reference indices Change point of contact to google group Improve legibility with comments on their own lines Unindent FollowOn job Add logging for index creation Add inline explaining alt file can't be generated Clarify that FAI is created separately from BWA indices Add docstring to move_file_job Add inline comment to explain disk space Reverse use of csv Replace run_bwakit arguments with config Fix imports and spacing in bwa_alignment Add /data/ prefix Add missing import Move run_bwa_index and run_samtools_faidx to indexing.py Remove unnecessary renaming Colocate parameter logic Add inline comments for conditionals and rg Provide clarity for rg_line Move required_length to common lib Remove bad print statement Add comments explaining conditional Clarify samples type Clarify file_size option Remove attempt at humor (+5 squashed commits) Squashed commits: Move bwa_kit to tool library (resolves #297) Add bwa_index and samtools_faidx to tool library Indent Job.Runner, add sanity checks options -> args (convention) Print docstring help if no arguments provided Remove config options Change config formatting Add tab-friendly config and manifest names Replace old bwa_kit job with download_sample_and_align Clean imports Move top docstring to main() Fix test call Fix adam_gatk_pipeline's call to download_reference_files Add job version of `move_files` Update README.md Remove deprecated launch scripts Require only reference (resolves #320) Change download_shared_files -> download_reference_files Remove old reference requirements Add single end support for BWA (resolves #322) Edit manifest docs to include single-end Add nargs range for single-end and paired-end Replace parse_config job with parse_manifest function
- Loading branch information
Showing
17 changed files
with
520 additions
and
360 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,47 +1,146 @@ | ||
## University of California, Santa Cruz Genomics Institute | ||
### GATK-compatible Alignment | ||
### Guide: Running the BWA Pipeline using Toil | ||
|
||
This guide attempts to walk the user through running this pipeline from start to finish. | ||
|
||
If you find any errors or corrections please feel free to make a pull request. Feedback of any kind is appreciated. | ||
|
||
If there are any questions please contact John Vivian ([email protected]). | ||
If you find any errors or corrections please feel free to make a pull request. | ||
Feedback of any kind is appreciated. | ||
|
||
## Overview | ||
|
||
This pipeline accepts two fastq files (by URL) to be aligned into a BAMFILE, which is the final output of the pipeline. | ||
A launch script is provided for 4 different references (b37, hg19, hg38, and hg38 no alternative loci). | ||
Fastqs are aligned to create a BAM that is compatible with GATK. | ||
|
||
## Installation | ||
|
||
Toil-scripts is now pip installable! `pip install toil-scripts` for a toil-stable version | ||
or `pip install --pre toil-scripts` for cutting edge development version. | ||
|
||
Type: `toil-bwa` to get basic help menu and instructions | ||
|
||
To decrease the chance of versioning conflicts, install toil-scripts into a virtualenv: | ||
|
||
- `virtualenv ~/toil-scripts` | ||
- `source ~/toil-scripts/bin/activate` | ||
- `pip install toil` | ||
- `pip install toil-scripts` | ||
|
||
If Toil is already installed globally (true for CGCloud users), or there are global dependencies (like Mesos), | ||
use virtualenv's `--system-site-packages` flag. | ||
|
||
## Dependencies | ||
|
||
This pipeline has been tested on Ubuntu 14.04, but should also run on other unix based systems. `apt-get` and `pip` | ||
often require `sudo` privilege, so if the below commands fail, try prepending `sudo`. If you do not have sudo | ||
privileges you will need to build these tools from source, or bug a sysadmin (they don't mind). | ||
often require `sudo` privilege, so if the below commands fail, try prepending `sudo`. If you do not have `sudo` | ||
privileges you will need to build these tools from source, or bug a sysadmin about how to get them. | ||
|
||
#### General Dependencies | ||
|
||
1. Python 2.7 | ||
2. Curl apt-get install curl | ||
3. Docker http://docs.docker.com/engine/installation/ | ||
2. Curl apt-get install curl | ||
3. Docker http://docs.docker.com/engine/installation/ | ||
|
||
#### Python Dependencies | ||
|
||
1. Toil pip install toil | ||
2. S3AM pip install --pre s3am (optional, for upload of BAMFILE to S3) | ||
1. Toil pip install toil | ||
2. S3AM pip install --pre s3am (optional, needed for uploading output to S3) | ||
|
||
## Inputs | ||
|
||
The BWA pipeline requires input files in order to run. The only required input, aside from the sample(s), is a | ||
reference genome. The pipeline can be sped up by specifying URLs for the reference index files, which are generated | ||
with `bwa index` and `samtools faidx`. | ||
|
||
## General Usage | ||
|
||
## Output | ||
1. Type `toil-bwa generate` to create an editable manifest and config in the current working directory. | ||
2. Parameterize the pipeline by editing the config. | ||
3. Fill in the manifest with information pertaining to your samples. | ||
4. Type `toil-bwa run [jobStore]` to execute the pipeline. | ||
|
||
This pipeline produces a BAMFILE for a given sample. | ||
## Example Commands | ||
|
||
## Running / Help | ||
Run sample(s) locally using the manifest | ||
1. `toil-bwa generate` | ||
2. Fill in config and manifest | ||
3. `toil-bwa run ./example-jobstore` | ||
|
||
It is recommended to use the associated launch scripts which provide default arguments needed to run the pipeline. | ||
It is likely that the job store positional argument, `--workDir`, and `--output-dir` arguments will need to be modified. | ||
To run a pipeline after dependencies have been installed, simply: | ||
Toil options can be appended to `toil-bwa run`, for example: | ||
`toil-bwa run ./example-jobstore --retryCount=1 --workDir=/data` | ||
|
||
* `git clone https://github.com/BD2KGenomics/toil-scripts` | ||
* `/toil-scripts/src/toil_scripts/batch_alignment/launch_bwa_hg38_no_alt.sh` | ||
For a complete list of Toil options, just type `toil-bwa run -h` | ||
|
||
Run a variety of samples locally | ||
1. `toil-bwa generate-config` | ||
2. Fill in config | ||
3. `toil-bwa run ./example-jobstore --retryCount=1 --workDir=/data --sample \ | ||
test-uuid file:///full/path/to/read1.fq.gz file:///full/path/to/read2.fq.gz` | ||
|
||
## Example Config | ||
|
||
``` | ||
# BWA Alignment Pipeline configuration file | ||
# This configuration file is formatted in YAML. Simply write the value (at least one space) after the colon. | ||
# Edit the values in this configuration file and then rerun the pipeline: "toil-bwa run" | ||
# URLs can take the form: http://, file://, s3://, gnos://. | ||
# Comments (beginning with #) do not need to be removed. Optional parameters may be left blank | ||
############################################################################################################## | ||
# Required: Reference fasta file | ||
ref: s3://cgl-pipeline-inputs/alignment/hg19.fa | ||
# Required: Output location of sample. Can be full path to a directory or an s3:// URL | ||
output-dir: /data/ | ||
# Required: The library entry to go in the BAM read group. | ||
library: Illumina | ||
# Required: Platform to put in the read group | ||
platform: Illumina | ||
# Required: Program Unit for BAM header. Required for use with GATK. | ||
program_unit: 12345 | ||
# Required: Approximate input file size. Provided as a number followed by (base-10) [TGMK]. E.g. 10M, 150G | ||
file-size: 50G | ||
# Optional: If true, sorts bam | ||
sort: True | ||
# Optional. If true, trims adapters | ||
trim: false | ||
# Optional: Reference fasta file (amb) -- if not present will be generated | ||
amb: s3://cgl-pipeline-inputs/alignment/hg19.fa.amb | ||
# Optional: Reference fasta file (ann) -- If not present will be generated | ||
ann: s3://cgl-pipeline-inputs/alignment/hg19.fa.ann | ||
# Optional: Reference fasta file (bwt) -- If not present will be generated | ||
bwt: s3://cgl-pipeline-inputs/alignment/hg19.fa.bwt | ||
# Optional: Reference fasta file (pac) -- If not present will be generated | ||
pac: s3://cgl-pipeline-inputs/alignment/hg19.fa.pac | ||
# Optional: Reference fasta file (sa) -- If not present will be generated | ||
sa: s3://cgl-pipeline-inputs/alignment/hg19.fa.sa | ||
# Optional: Reference fasta file (fai) -- If not present will be generated | ||
fai: s3://cgl-pipeline-inputs/alignment/hg19.fa.fai | ||
# Optional: (string) Path to Key File for SSE-C Encryption | ||
ssec: | ||
# Optional: Use instead of library, program_unit, and platform. | ||
rg-line: | ||
# Optional: Alternate file for reference build (alt). Necessary for alt aware alignment | ||
alt: | ||
# Optional: If true, runs the pipeline in mock mode, generating a fake output bam | ||
mock-mode: | ||
``` | ||
|
||
Due to PYTHONPATH issues, help can be found by typing: | ||
## Distributed Run | ||
|
||
* `cd toil-scripts/src` | ||
* `python -m toil_scripts.batch_alignment.bwa_alignment --help` | ||
|
||
To run on a distributed AWS cluster, see [CGCloud](https://github.com/BD2KGenomics/cgcloud) for instance provisioning, | ||
then run `toil-bwa run aws:us-west-2:example-jobstore-bucket --batchSystem=mesos --mesosMaster mesos-master:5050` | ||
to use the AWS job store and mesos batch system. |
Oops, something went wrong.