-
Notifications
You must be signed in to change notification settings - Fork 26
Home
These tools allow 1) sample demultiplexing of Illumina sequencing reads that are indexed with both sample-specific and molecule-specific (UMI) barcodes, and 2) consolidation of reads corresponding to the same original (pre-PCR) template molecule into a single representative read.
It is currently implemented for dual-index paired-end data and requires four input FASTQ files: forward and reverse reads (R1, R2) and index reads (I1, I2). Sample and molecular barcodes are extracted from the index reads. E.g:
This example shows a dual sample indexing scheme where the 16-base sample barcode consists of two parts - 8 bases from index read1 and 8 bases from index read 2. Index read 2 also contains an 8-base molecular barcode. We create a 'molecular index' by concatenating the molecular barcode with the first few (typically 6) bases of read 1. All read pairs with the same molecular index are presumed to represent PCR products of the same original template molecule and should be consolidated to a single representative read.
Features:
- Demultiplex reads based on sample barcodes
- Consolidate reads with the same molecular index (representing the same template molecule) into a single consensus read.
- argparse
- HTSeq
Four FASTQ files corresponding to forward and reverse reads (R1, R2) and index reads (I1, I2). The default MiSeq settings do not generate index reads. See Configuring a MiSeq to output index reads.
The example directory contains undemultipexed data from an Illumina MiSeq run:
-
example/undemux.r1.fastq.gz
- Forward read -
example/undemux.r2.fastq.gz
- Reverse read -
example/undemux.i1.fastq.gz
- Index read 1 -
example/undemux.i2.fastq.gz
- Index read 2
cd example
python ../demultiplex.py --min_reads 1000 --read1 undemux.r1.fastq.gz --read2 undemux.r2.fastq.gz --index1 undemux.i1.fastq.gz --index2 undemux.i2.fastq.gz --sample_barcodes samplekey.txt
The UMI tag is added as the third field of the read name line. It consists of the molecular barcode extracted from the index read concatenated with the first six bases of R1.
python ../umitag.py --read1_in mysample.r1.fastq --read2_in mysample.r2.fastq --read1_out mysample.r1.umitagged.fastq --read2_out mysample.r2.umitagged.fastq --index1 mysample.i1.fastq --index2 mysample.i2.fastq
python ../consolidate.py mysample.r1.umitagged.fastq mysample.r1.consolidated.fastq 15 0.9
python ../consolidate.py mysample.r2.umitagged.fastq mysample.r2.consolidated.fastq 15 0.9