Scripts to parse sequence data from epicPCR libraries.
10/29/2014 Sarah J. Spencer, Alm Lab, MIT
- Introduction
- Requirements
- Installation
- Command Line Arguments
- Maintainers
- References
Starting with raw fastq files of paired-end data, run the following list of commands to generate curated fasta files of 16S reads as well as Operational Taxonomic Units within epicPCR libraries. Scripts are either included in the QIIME package or they are custom scripts available at
If you prepared an epicPCR reaction with new primer sets and target genes, you should modify the custom scripts or to recognize your tailored fusion structure.
- Perl version 5.10.1
- Python version 2.7.6
- QIIME version 1.8.0
- Perl is available for download from
- Python is available for download from
- QIIME is available for download from
- The custom python and perl scripts in this directory are not packaged in a module. No installation is necessary, simply download the scripts and run them using local installations of perl, python, and QIIME.
1 Fastq to fasta
1.1 Join paired-end sequences using fastq quality scores (QIIME script) -f [fastq F] -r [fastq R] -o [output directory]
1.2 Extract multiplexed sample barcodes from joined fastq file (custom script)
perl [joined fastq] > [output file]
1.3 Quality filter and split sample libraries (QIIME script) -i [joined fastq] -b [barcode file]
-o [output directory] -m [mapping file] --barcode_type 8
--min_per_read_length_fraction 0.40 -q 20 --max_barcode_errors 0
--max_bad_run_length 0
1.4 Separate individual samples into separate files (QIIME script) -i [input fasta] -o [output fasta]
-s [sample ID]
1.5 Check for chimeras within stitched sequences (QIIME script) -m usearch61 -i [input fasta]
--suppress_usearch61_ref -o [output directory]
1.6 Export fasta file with non-chimeric sequences (custom script)
python -i [input fasta] -n [non-chimeric sample IDs]
-o [output fasta]
1.7 Filter fasta sequences for fusion structure and export trimmed 16S sequences
(custom scripts, either for barcode-16S or dsrB-16S fusions)
(for bulk 16S data, use to trim read lengths)
python -i [input fasta] -l [16S length] -o [output fasta]
python -i [input fasta] -l [16S length] -o [output fasta]
python -i [input fasta] -l [16S length] -o [output fasta]
For barcode-16S fusions, collapse identical barcode-16S pairs into a
consensus sequence for downstream analysis (custom script)
python -i [input fasta] -o [output fasta]
2 Fasta to Operational Taxonomic Units (OTUs)
NOTE: all the following commands are from the QIIME pipeline
2.1 Pick OTUs using uclust -i [input fasta] -o [output directory]
2.2 Pick representative OTU sequences based on abundance -i [otu text file] -f [input fasta]
-m most_abundant -o [representative fasta]
2.3 Assign taxonomy to representative sequences using the greengenes database -i [representative fasta] -o [output directory]
2.4 Make OTU table -i [otu text file] -t [taxonomy text file]
-o [biom file]
2.5 Rarefactions to even the sequencing depth
NOTE: only perform this step if comparing sensitivity across samples -i [biom file] -d [read depth]
-o [output directory] --lineages_included
2.6 Summarize taxa based on OTU table -i [biom file] -o [output directory]
Current maintainers:
- Sarah J. Spencer ([email protected])
- Sarah P. Preheim ([email protected])
This material by ENIGMA - Ecosystems and Networks Integrated with Genes and Molecular Assemblies (, a Scientific Focus Area Program at Lawrence Berkeley National Laboratory is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Biological & Environmental Research under contract number DE-AC02-05CH11231.
Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Pena AG, Goodrich JK, Gordon JI, et al. 2010. QIIME allows analysis of high-throughput community sequencing data. Nat Methods 7: 335-336.