Skip to content

Commit

Permalink
FIX: transposing ranks for final output. (#99)
Browse files Browse the repository at this point in the history
* FIX: transposing ranks for final output.  Updating documentation in readme accordingly

* Adding in transformers for metadata

* fixing import

* fixing index name for ranks

* Add transformer

* fixed ordering of rank calculations in q2

* TST: made transposes consistent

Found that standalone cli wasn't consistent with q2. Biases weren't factored into standalone ranks.
Now fixed.

* TST:minor refactor

* flake8

* Adding check for soils

* TST: adding additional check in cystic fibrosis study.

Also adding in cool figure, because, why not

* Adding in changelog update

* Update CHANGELOG.md
  • Loading branch information
mortonjt committed Oct 17, 2019
1 parent c84460e commit 7457c87
Show file tree
Hide file tree
Showing 16 changed files with 425 additions and 81 deletions.
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
# mmvec changelog

## Version 1.0.1 (2019-10-17)
# Enhancements
- Ranks are transposed and viewable in qiime metadata tabulate [#99](https://github.com/biocore/mmvec/pull/99)

# Bug fixes
- Ranks are now calculated consistently between q2 and standalone cli [#99](https://github.com/biocore/mmvec/pull/99)

## Version 1.0.0 (2019-09-30)
# Enhancements
- Paired heatmaps are available [#89](https://github.com/biocore/mmvec/pull/89)
Expand Down
33 changes: 20 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
# mmvec
Neural networks for estimating microbe-metabolite interactions through their co-occurence probabilities.

![](https://github.com/biocore/mmvec/raw/master/img/mmvec.png "mmvec")

# Installation

MMvec can be installed via pypi as follows
Expand Down Expand Up @@ -45,7 +47,7 @@ More information can found under `mmvec --help`

# Qiime2 plugin

If you want to make this qiime2 compatible, install this in your
If you want to run this in a qiime environment, install this in your
qiime2 conda environment (see qiime2 installation instructions [here](https://qiime2.org/)) and run the following

```
Expand Down Expand Up @@ -76,26 +78,22 @@ qiime mmvec paired-omics \
--o-conditionals ranks.qza \
--o-conditional-biplot biplot.qza
```

In the results, there are two files, namely `results/conditional_biplot.qza` and `results/conditionals.qza`. The conditional biplot is a biplot representation the
conditional probability matrix so that you can visualize these microbe-metabolite interactions in an exploratory manner. This can be directly visualized in
Emperor as shown below. We also have the estimated conditional probability matrix given in `results/conditionals.qza`,
which an be unzip to yield a tab-delimited table via `unzip results/conditionals`. Each row can be ranked,
so the top most occurring metabolites for a given microbe can be obtained by identifying the highest co-occurrence probabilities for each microbe.

It is worth your time to investigate the logs (labeled under `logdir**`) that are deposited using Tensorboard.
The actual logfiles within this directory are labeled `events.out.tfevents.*` : more discussion on this later.

These log conditional probabilities can also be viewed directly with `qiime metadata tabulate`. This can be
created as follows

Tensorboard can be run via
```
tensorboard --logdir .
qiime metadata tabulate \
--m-input-file results/conditionals.qza \
--o-visualization conditionals-viz.qzv
```

You may need to tinker with the parameters to get readable tensorflow results, namely `--p-summary-interval`,
`--epochs` and `--batch-size`.

A description of these two graphs is outlined in the FAQs below.


Then you can run the following to generate a emperor biplot.

Expand Down Expand Up @@ -197,6 +195,14 @@ More information behind the actions and parameters can found under `qiime mmvec

3. More model parameters : The standalone script will return the bias parameters learned for each dataset (i.e. microbe and metabolite abundances). These are stored under the summary directory (specified by `--summary`) under the names `embeddings.csv`. This file will hold the coordinates for the microbes and metabolites, along with biases. There are 4 columns in this file, namely `feature_id`, `axis`, `embed_type` and `values`. `feature_id` is the name of the feature, whether it be a microbe name or a metabolite feature id. `axis` corresponds to the name of the axis, which either corresponds to a PC axis or bias. `embed_type` denotes if the coordinate corresponds to a microbe or metabolite. `values` is the coordinate value for the given `axis`, `embed_type` and `feature_id`. This can be useful for accessing the raw parameters and building custom biplots / ranks visualizations - this also has the advantage of requiring much less memory to manipulate.

It is also important to note that you don't have to explicitly chose - it is very doable to run the standalone version first, then import those output files into qiime2. Importing can be done as follows

```
qiime tools import --input-path <your ranks file> --output-path conditionals.qza --type FeatureData[Conditional]
qiime tools import --input-path <your ordination file> --output-path ordination.qza --type 'PCoAResults % ("biplot")'
```

**Q** : You mentioned that you can use GPUs. How can you do that??

**A** : This can be done by running `pip install tensorflow-gpu` in your environment. See details [here](https://www.tensorflow.org/install/gpu).
Expand All @@ -209,7 +215,7 @@ At the moment, these capabilities are only available for the standalone CLI due

**Q** : I'm confused, what is Tensorboard?

**A** : Tensorboard is a diagnostic tool that runs in a web browser. To open tensorboard, make sure you’re in the mmvec environment and cd into the folder you are running the script above from. Then run:
**A** : Tensorboard is a diagnostic tool that runs in a web browser - note that this is only explicitly supported in the standalone version of mmvec. To open tensorboard, make sure you’re in the mmvec environment and cd into the folder you are running the script above from. Then run:

```
tensorboard --logdir .
Expand Down Expand Up @@ -237,7 +243,8 @@ The x-axis is the number of iterations (meaning times the model is training acro

The y-axis is the average number of counts off for each feature. The model is predicting the sequence counts for each feature in the samples that were set aside for testing. So in the graph above it means that, on average, the model is off by ~0.75 intensity units, which is low. However, this is ABSOLUTE error not relative error (unfortunately we don't know how to compute relative errors because of the sparsity in these datasets).

You can also compare multiple runs with different parameters to see which run performed the best. If you are doing this, be sure to look at the `training-column` example make the testing samples consistent across runs.
You can also compare multiple runs with different parameters to see which run performed the best. Useful parameters to note are `--epochs` and `--batch-size`. If you are committed to fine-tuning parameters, be sure to look at the `training-column` example make the testing samples consistent across runs.


**Q** : What's up with the `--training-column` argument?

Expand Down
212 changes: 212 additions & 0 deletions examples/cf/check_rhamnolipids.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,212 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"!ls"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[34mlatent_dim_3_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95\u001b[m\u001b[m\r\n",
"latent_dim_3_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95_embedding.txt\r\n",
"latent_dim_3_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95_ordination.txt\r\n",
"latent_dim_3_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95_ranks.txt\r\n"
]
}
],
"source": [
"!ls testing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Standalone check"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"fname = 'latent_dim_3_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95_ranks.txt'\n",
"ranks = pd.read_csv(f'testing/{fname}', sep='\\t', index_col=0)\n",
"microbe_metadata = pd.read_csv('microbe-metadata.txt', sep='\\t', index_col=0)\n",
"metabolite_metadata = pd.read_csv('metabolite-metadata.txt', sep='\\t', index_col=0)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"microbe_metadata = microbe_metadata.loc[ranks.columns]\n",
"i = microbe_metadata.Taxon.apply(lambda x: 'Pseudomonas' in x)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"pseudomonas = microbe_metadata.loc[i].index"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"metabolite_metadata = metabolite_metadata.dropna(subset=['expert_annotation'])"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"19"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.sum(ranks.loc[metabolite_metadata.index, pseudomonas[0]] > 0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# qiime2 check"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n"
]
}
],
"source": [
"import qiime2\n",
"ranks = qiime2.Artifact.load('ranks.qza').view(pd.DataFrame)\n",
"microbe_metadata = pd.read_csv('microbe-metadata.txt', sep='\\t', index_col=0)\n",
"metabolite_metadata = pd.read_csv('metabolite-metadata.txt', sep='\\t', index_col=0)\n",
"microbe_metadata = microbe_metadata.loc[ranks.columns]\n",
"i = microbe_metadata.Taxon.apply(lambda x: 'Pseudomonas' in x)\n",
"pseudomonas = microbe_metadata.loc[i].index"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"metabolite_metadata = metabolite_metadata.dropna(subset=['expert_annotation'])"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"19"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.sum(ranks.loc[metabolite_metadata.index, pseudomonas[0]] > 0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
6 changes: 3 additions & 3 deletions examples/cf/q2_run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ qiime mmvec paired-omics \
--p-learning-rate 1e-3 \
--o-conditionals ranks.qza \
--o-conditional-biplot biplot.qza \
--p-summary-interval 1 \
--verbose

qiime emperor biplot \
Expand All @@ -27,6 +28,5 @@ mmvec paired-omics \
--metabolite-file lcms_nt.biom \
--epochs 100 \
--learning-rate 1e-3 \
--summary-dir testing

qiime tools import --input-path testing/latent_dim_3_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95_ranks.txt --output-path ranks.qza --type FeatureData[Conditional]
--summary-interval 1 \
--summary-dir summary
Loading

0 comments on commit 7457c87

Please sign in to comment.