Skip to content

mestaki/qiime2-to-BugBase

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 

Repository files navigation

Aim

This is a simple workflow that aims to help create a BugBase-compatible biom file from within QIIME 2. The BugBase pre-print can be found here. For additional details see its documentation page. As BugBase utilizes PICRUSt for its functional category predictions, the output table here should therefore also be PICRUSt-compatiable. Note that there is a PICRUSt2 which users should consider if they are not looking to use BugBase. I was/am not involved with BugBase or PICRUSt in any capacity, this is just a crude workaround that I found useful and thought others might benefit from.

Here I outline my approach for use with 16S data, not shotgun metagenomic data. I've tested this with QIIME 2 version 2021.8, but this should work with any older version as well. First activate your QIIME 2 environment:

conda activate qiime2-2021.8

Since BugBase requires as its input, an OTU table picked against the Greengenes database, I chose to cluster the DADA2-denoised reads from the Moving Pictures tutorial using vsearch.

You can use dereplicated sequences without denoising but I personally believe it’s better to utilise denoised reads even if OTU picking methods will ultimately be used.

First let's grab the required files:
We'll first need the feature table and representative sequences which are the output of DADA2 from the Moving Pictures tutorial.

You can download these manually from the links above or simply with wget.
The represenative sequences:

wget https://docs.qiime2.org/2021.8/data/tutorials/moving-pictures/rep-seqs-dada2.qza 

The DADA2 processed feature-table:

https://docs.qiime2.org/2021.8/data/tutorials/moving-pictures/table-dada2.qza

We also need the sample metadata file:

wget -O "metadata.tsv" https://data.qiime2.org/2021.8/tutorials/moving-pictures/sample_metadata.tsv

Finally, we need the Greengenes reference database, which we can download from the resource page. I downloaded the latest 13_8 version here:

wget ftp://greengenes.microbio.me/greengenes_release/gg_13_5/gg_13_8_otus.tar.gz

You can extract the entire content of this tarball but since we only need a couple of items from it, I'm going to just extract the 2 files we need.
The first item is the unaligned representative sequences. Here I am using 97% clustered OTUs for demonstration purposes, but you can use whatever % you want, most likely 99%.

tar -xzvf gg_13_8_otus.tar.gz gg_13_8_otus/rep_set/97_otus.fasta

Next we'll also extract the corresponding taxonomies. Note, make sure your taxonomy % matches your OTU rep-set %.

tar -xzvf gg_13_8_otus.tar.gz gg_13_8_otus/taxonomy/97_otu_taxonomy.txt

We’re going to need to import the 97% OTU fasta file (from the rep-set folder) into QIIME 2.

qiime tools import \
  --type 'FeatureData[Sequence]' \
  --input-path gg_13_8_otus/rep_set/97_otus.fasta \
  --output-path gg_97_otus.qza

Now I use vsearch to cluster the sequences at 97% similarity threshold using a closed-reference approach. You can use whatever % identity you want here.

qiime vsearch cluster-features-closed-reference \
  --i-sequences rep-seqs-dada2.qza \
  --i-table table-dada2.qza \
  --i-reference-sequences gg_97_otus.qza \
  --p-perc-identity 0.97 \
  --o-clustered-table table-cr-97.qza \
  --o-clustered-sequences rep-seqs-cr-97.qza \
  --o-unmatched-sequences unmatched-seqs \
  --verbose

BugBase requires that the biom table be in version 1.0 JSON format, and have taxonomic annotations instead of OTU IDs, so we need to do some adjustments first.

To do this we’ll need the underlying biom table within the table-cr-97.qza artifact.

qiime tools export \
  --input-path table-cr-97.qza \
  --output-path $PWD

This saves the exported biom table and calls it feature-table.biom.

Right now, our biom table has OTU ID# inherited from our reference database and not taxonomic annotations. So we’ll go ahead and add those taxonomies in. For this we need the 97_otu_taxonomy.txt file we extracted earlier.

In order to add taxonomy to our biom file first we need to add a new header to the 97_otu_taxonomy.txt file. We need to add #OTUID and taxonomy to our first line.

echo -e "#OTUID\ttaxonomy" | cat - gg_13_8_otus/taxonomy/97_otu_taxonomy.txt > 97_otu_taxonomy.txt

Now we're ready to add our taxonomy.

biom add-metadata -i feature-table.biom -o feature-table-tax.biom --observation-metadata-fp 97_otu_taxonomy.txt --sc-separated taxonomy

Finally, to convert our biom file to an older version (V1.0 JSON). I found a crude but simple solution to this which is to just convert our biom file to a .txt file then reconvert back using an older biom version. There is likely a more elegant way of doing this, but it works.

First, convert this to a .txt file:

biom convert --table-type="OTU table" -i feature-table-tax.biom -o feature-table-tax.txt --to-tsv --header-key taxonomy

Then re-convert back to the older biom version we need.

biom convert -i feature-table-tax.txt -o feature-table-tax-biom1.biom --table-type="OTU table" --to-json --process-obs-metadata taxonomy

This biom table is now compatible with BugBase. I validated this by uploading it on the web-based version of BugBase. Optionally you can upload a metadata file though a couple of minor notes on this. You will have to manually rename the first column to #SampleID as per BugBase’s requirement, and also delete the second row which describes the column categories (if you have these in your QIIME 2 metadata, as is the case with the Moving Pictures tutorial metadata). You'll also want to save this file as a .txt file.
The metadata file for this tutorial already has #SampleID as the first column header, but we do still need to delete the second row and convert to a .txt extension which BugBase is oddly picky about.

This command will accomplish both these requirements:

sed '2d' metadata.tsv > metadata.txt

You can now run this through BugBase. The image below was obtained using the web version of BugBase with all default settings and groups set to BodySite. It shows the predicted phenotypic trait (Aerobic) of the different body sites.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published