Skip to content

Scripts Reference implementations

Keiran Raine edited this page Mar 1, 2018 · 2 revisions

The scripts detailed here are reference implementations of various mapping and analysis algorithms.

They have been built in such a way that they can be run in two modes:

  1. Fire and forget - end to end processing using single command
  2. Farm friendly - additional command line args allow specific steps to be triggered in isolation

bwa_mem.pl

Runs 'BWA mem' method of mapping. Processes multiple lanes, merges, marks duplicates and provides completed sample BAM/CRAM file including index and md5.

Input can be paired-fastq, interleaved-fastq, BAM or CRAM. Using BAM/CRAM as input is preferred and will allow header information to be transferred (important for library tracking in duplicate removal).

WARNING: mmqc

If you create a mapped file with the -mmqc flag you need to preprocess it before remapping through bwa_mem.pl as reads will have been marked as QC_FAIL with an additional aux tag. bwa_mem.pl will detect if this is necessary based on the headers provided they are intact.

To clean the file run:

bammaskflags maskneg=512 auxexists=mm < mmqc.bam > cleaned.bam

The cleaned.bam is now suitable for use as an input to bwa_mem.pl.

bwa_aln.pl

Please consider this legacy Runs BWA aln+sampe method of mapping (AKA: BWA backtrack). Processes multiple lanes, merges, marks duplicates and provides completed sample BAM file including index and md5.

The default installation of PCAP-core will not install the required version of BWA for this step (0.6.2). If you wish to use this you will need to build and make this available on you path.

Clone this wiki locally