Skip to content

QIIME 2 plugin for normalizing sequences by 16S rRNA gene copy number (GCN) based on rrnDB database

License

Notifications You must be signed in to change notification settings

Jiung-Wen/q2-gcn-norm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

q2-gcn-norm

QIIME 2 plugin for normalizing sequences by 16S rRNA gene copy number (GCN) based on rrnDB database

Introduction:

This plugin normalizes sequences by 16S rRNA gene copy number (GCN) based on rrnDB database (version 5.7). The script matches the taxa of sequences with the rrnDB-5.7_pantaxa_stats_NCBI.tsv file, starting from the lowest rank. If a match is found, the mean of GCN for the taxon is assigned; if not, the script will try to match a higher rank until the highest rank is met. All the unassigned sequences are assumed to have one GCN.

Note that the mean column in the rrnDB-5.7_pantaxa_stats_NCBI.tsv is, according to the rrnDB manual, calculated from the means of the pan-taxa of immediate lower rank. Therefore, the mean of GCN might be different from the rrndb online search result. For example, the "mean" of GCN for bacteria is 1.94 in the downloading tsv file, whereas the mean of GCN for all the bacterial taxa is 5.2 if you search rrnDB online database.

Install guide:

We assume you have a conda environment with the QIIME 2 Core distribution installed. First, activate the conda environment:

conda activate name-of-your-qiime2-env

Next, install q2-gcn-norm with the following command:

conda install -c jiungwenchen q2-gcn-norm

Running example:

We use artifacts from QIIME 2's "Moving Pictures" tutorial as test files. Use the following commands to download the files.

# DADA2 output artifact:
wget https://docs.qiime2.org/2019.10/data/tutorials/moving-pictures/table-dada2.qza

# Taxonomic analysis output artifact:
wget https://docs.qiime2.org/2019.10/data/tutorials/moving-pictures/taxonomy.qza

We can normalize the FeatureTable using the command below:

qiime gcn-norm copy-num-normalize \
  --i-table table-dada2.qza \
  --i-taxonomy taxonomy.qza \
  --o-gcn-norm-table table-normalized.qza

The outputs would be an artifact of type FeatureTable[Frequency] % Properties('copy_number_normalized').

Note that the taxonomy format should be like Greengenes' k__foo; p__bar; c__ ... or SILVA's D_0__foo;D_1__bar;D_2__ .... Other formats, e.g. k__foo;p__bar;c__ ...(no space after semicolon) or k__foo|p__bar|c__ ...(use pipe as delimiter), are currently unsupported and will raise error.

Now you can perform analyses as you usually do in QIIME 2 with the GCN-normalized FeatureTable. For example, let's do the ANCOM analysis with the new FeatureTable and compare the result from this example with that from "Moving Pictures" tutorial.

# get the metadata from "Moving Pictures" tutorial
wget \
  -O "sample-metadata.tsv" \
  "https://data.qiime2.org/2019.10/tutorials/moving-pictures/sample_metadata.tsv"

# ANCOM analysis
qiime feature-table filter-samples \
  --i-table table-normalized.qza \
  --m-metadata-file sample-metadata.tsv \
  --p-where "[body-site]='gut'" \
  --o-filtered-table gut-table-normalized.qza
  
qiime taxa collapse \
  --i-table gut-table-normalized.qza \
  --i-taxonomy taxonomy.qza \
  --p-level 6 \
  --o-collapsed-table gut-table-l6-normalized.qza

qiime composition add-pseudocount \
  --i-table gut-table-l6-normalized.qza \
  --o-composition-table comp-gut-table-l6-normalized.qza

qiime composition ancom \
  --i-table comp-gut-table-l6-normalized.qza \
  --m-metadata-file sample-metadata.tsv \
  --m-metadata-column subject \
  --o-visualization l6-ancom-subject-normalized.qzv

ANCOM output visualizations:

  • l6-ancom-subject.qzv (from official tutorial): view
    Screenshot:

  • l6-ancom-subject_normalized.qzv (from this example): view
    Screenshot:

You may also want to compare the change in relative abundance using taxonomic bar plots:

Generally, the GCN normalization may not have a huge impact on your analysis results, but someone (e.g. reviewer or, in my case, supervisor) may ask you to do so. For more discussion about GCN normalization, check the related topic in QIIME 2 forum.

Versions

  • 2021.04 - updated to rrnDB version 5.7
  • 2019.11.1 - first working version, rrnDB version 5.6
  • 2019.11 - didn't work 🙃

Citations

Stoddard, S. F., Smith, B. J., Hein, R., Roller, B. R., & Schmidt, T. M. (2015). rrnDB: improved tools for interpreting rRNA gene abundance in bacteria and archaea and a new foundation for future development. Nucleic acids research, 43(Database issue), D593–D598. https://doi.org/10.1093/nar/gku1201

Chen, M. Y., Chen, J. W., Wu, L. W., Huang, K. C., Chen, J. Y., Wu, W. S., Chiang, W. F., Shih, C. J., Tsai, K. N., Hsieh, W. T., Ho, Y. H., Wong, T. Y., Wu, J. H., & Chen, Y. L. (2021). Carcinogenesis of Male Oral Submucous Fibrosis Alters Salivary Microbiomes. Journal of Dental Research, 100(4), 397–405. https://doi.org/10.1177/0022034520968750

About

QIIME 2 plugin for normalizing sequences by 16S rRNA gene copy number (GCN) based on rrnDB database

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published