diff --git a/CHANGELOG.md b/CHANGELOG.md index 564805a..ba2b892 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,12 @@ # mmvec changelog +## Version 1.0.1 (2019-10-17) +# Enhancements + - Ranks are transposed and viewable in qiime metadata tabulate [#99](https://github.com/biocore/mmvec/pull/99) + +# Bug fixes + - Ranks are now calculated consistently between q2 and standalone cli [#99](https://github.com/biocore/mmvec/pull/99) + ## Version 1.0.0 (2019-09-30) # Enhancements - Paired heatmaps are available [#89](https://github.com/biocore/mmvec/pull/89) diff --git a/README.md b/README.md index ffb2ae3..9643775 100644 --- a/README.md +++ b/README.md @@ -3,6 +3,8 @@ # mmvec Neural networks for estimating microbe-metabolite interactions through their co-occurence probabilities. +![](https://github.com/biocore/mmvec/raw/master/img/mmvec.png "mmvec") + # Installation MMvec can be installed via pypi as follows @@ -45,7 +47,7 @@ More information can found under `mmvec --help` # Qiime2 plugin -If you want to make this qiime2 compatible, install this in your +If you want to run this in a qiime environment, install this in your qiime2 conda environment (see qiime2 installation instructions [here](https://qiime2.org/)) and run the following ``` @@ -76,26 +78,22 @@ qiime mmvec paired-omics \ --o-conditionals ranks.qza \ --o-conditional-biplot biplot.qza ``` + In the results, there are two files, namely `results/conditional_biplot.qza` and `results/conditionals.qza`. The conditional biplot is a biplot representation the conditional probability matrix so that you can visualize these microbe-metabolite interactions in an exploratory manner. This can be directly visualized in Emperor as shown below. We also have the estimated conditional probability matrix given in `results/conditionals.qza`, which an be unzip to yield a tab-delimited table via `unzip results/conditionals`. Each row can be ranked, so the top most occurring metabolites for a given microbe can be obtained by identifying the highest co-occurrence probabilities for each microbe. -It is worth your time to investigate the logs (labeled under `logdir**`) that are deposited using Tensorboard. -The actual logfiles within this directory are labeled `events.out.tfevents.*` : more discussion on this later. - +These log conditional probabilities can also be viewed directly with `qiime metadata tabulate`. This can be +created as follows -Tensorboard can be run via ``` -tensorboard --logdir . +qiime metadata tabulate \ + --m-input-file results/conditionals.qza \ + --o-visualization conditionals-viz.qzv ``` -You may need to tinker with the parameters to get readable tensorflow results, namely `--p-summary-interval`, -`--epochs` and `--batch-size`. - -A description of these two graphs is outlined in the FAQs below. - Then you can run the following to generate a emperor biplot. @@ -197,6 +195,14 @@ More information behind the actions and parameters can found under `qiime mmvec 3. More model parameters : The standalone script will return the bias parameters learned for each dataset (i.e. microbe and metabolite abundances). These are stored under the summary directory (specified by `--summary`) under the names `embeddings.csv`. This file will hold the coordinates for the microbes and metabolites, along with biases. There are 4 columns in this file, namely `feature_id`, `axis`, `embed_type` and `values`. `feature_id` is the name of the feature, whether it be a microbe name or a metabolite feature id. `axis` corresponds to the name of the axis, which either corresponds to a PC axis or bias. `embed_type` denotes if the coordinate corresponds to a microbe or metabolite. `values` is the coordinate value for the given `axis`, `embed_type` and `feature_id`. This can be useful for accessing the raw parameters and building custom biplots / ranks visualizations - this also has the advantage of requiring much less memory to manipulate. +It is also important to note that you don't have to explicitly chose - it is very doable to run the standalone version first, then import those output files into qiime2. Importing can be done as follows + +``` +qiime tools import --input-path --output-path conditionals.qza --type FeatureData[Conditional] + +qiime tools import --input-path --output-path ordination.qza --type 'PCoAResults % ("biplot")' +``` + **Q** : You mentioned that you can use GPUs. How can you do that?? **A** : This can be done by running `pip install tensorflow-gpu` in your environment. See details [here](https://www.tensorflow.org/install/gpu). @@ -209,7 +215,7 @@ At the moment, these capabilities are only available for the standalone CLI due **Q** : I'm confused, what is Tensorboard? -**A** : Tensorboard is a diagnostic tool that runs in a web browser. To open tensorboard, make sure you’re in the mmvec environment and cd into the folder you are running the script above from. Then run: +**A** : Tensorboard is a diagnostic tool that runs in a web browser - note that this is only explicitly supported in the standalone version of mmvec. To open tensorboard, make sure you’re in the mmvec environment and cd into the folder you are running the script above from. Then run: ``` tensorboard --logdir . @@ -237,7 +243,8 @@ The x-axis is the number of iterations (meaning times the model is training acro The y-axis is the average number of counts off for each feature. The model is predicting the sequence counts for each feature in the samples that were set aside for testing. So in the graph above it means that, on average, the model is off by ~0.75 intensity units, which is low. However, this is ABSOLUTE error not relative error (unfortunately we don't know how to compute relative errors because of the sparsity in these datasets). -You can also compare multiple runs with different parameters to see which run performed the best. If you are doing this, be sure to look at the `training-column` example make the testing samples consistent across runs. +You can also compare multiple runs with different parameters to see which run performed the best. Useful parameters to note are `--epochs` and `--batch-size`. If you are committed to fine-tuning parameters, be sure to look at the `training-column` example make the testing samples consistent across runs. + **Q** : What's up with the `--training-column` argument? diff --git a/examples/cf/check_rhamnolipids.ipynb b/examples/cf/check_rhamnolipids.ipynb new file mode 100644 index 0000000..3c763d1 --- /dev/null +++ b/examples/cf/check_rhamnolipids.ipynb @@ -0,0 +1,212 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "!ls" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[34mlatent_dim_3_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95\u001b[m\u001b[m\r\n", + "latent_dim_3_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95_embedding.txt\r\n", + "latent_dim_3_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95_ordination.txt\r\n", + "latent_dim_3_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95_ranks.txt\r\n" + ] + } + ], + "source": [ + "!ls testing" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Standalone check" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import numpy as np\n", + "fname = 'latent_dim_3_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95_ranks.txt'\n", + "ranks = pd.read_csv(f'testing/{fname}', sep='\\t', index_col=0)\n", + "microbe_metadata = pd.read_csv('microbe-metadata.txt', sep='\\t', index_col=0)\n", + "metabolite_metadata = pd.read_csv('metabolite-metadata.txt', sep='\\t', index_col=0)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "microbe_metadata = microbe_metadata.loc[ranks.columns]\n", + "i = microbe_metadata.Taxon.apply(lambda x: 'Pseudomonas' in x)" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "pseudomonas = microbe_metadata.loc[i].index" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "metabolite_metadata = metabolite_metadata.dropna(subset=['expert_annotation'])" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "19" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.sum(ranks.loc[metabolite_metadata.index, pseudomonas[0]] > 0)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# qiime2 check" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n", + "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n" + ] + } + ], + "source": [ + "import qiime2\n", + "ranks = qiime2.Artifact.load('ranks.qza').view(pd.DataFrame)\n", + "microbe_metadata = pd.read_csv('microbe-metadata.txt', sep='\\t', index_col=0)\n", + "metabolite_metadata = pd.read_csv('metabolite-metadata.txt', sep='\\t', index_col=0)\n", + "microbe_metadata = microbe_metadata.loc[ranks.columns]\n", + "i = microbe_metadata.Taxon.apply(lambda x: 'Pseudomonas' in x)\n", + "pseudomonas = microbe_metadata.loc[i].index" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "metabolite_metadata = metabolite_metadata.dropna(subset=['expert_annotation'])" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "19" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.sum(ranks.loc[metabolite_metadata.index, pseudomonas[0]] > 0)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/examples/cf/q2_run.sh b/examples/cf/q2_run.sh index 42dc15f..fad670f 100644 --- a/examples/cf/q2_run.sh +++ b/examples/cf/q2_run.sh @@ -8,6 +8,7 @@ qiime mmvec paired-omics \ --p-learning-rate 1e-3 \ --o-conditionals ranks.qza \ --o-conditional-biplot biplot.qza \ + --p-summary-interval 1 \ --verbose qiime emperor biplot \ @@ -27,6 +28,5 @@ mmvec paired-omics \ --metabolite-file lcms_nt.biom \ --epochs 100 \ --learning-rate 1e-3 \ - --summary-dir testing - -qiime tools import --input-path testing/latent_dim_3_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95_ranks.txt --output-path ranks.qza --type FeatureData[Conditional] + --summary-interval 1 \ + --summary-dir summary diff --git a/examples/soils/check_soils.ipynb b/examples/soils/check_soils.ipynb index e2d904d..c407914 100644 --- a/examples/soils/check_soils.ipynb +++ b/examples/soils/check_soils.ipynb @@ -11,6 +11,13 @@ "%matplotlib inline" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# standalone check" + ] + }, { "cell_type": "code", "execution_count": 2, @@ -21,9 +28,9 @@ "output_type": "stream", "text": [ "\u001b[34mlatent_dim_1_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95\u001b[m\u001b[m\r\n", - "latent_dim_1_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95_embedding.csv\r\n", + "latent_dim_1_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95_embedding.txt\r\n", "latent_dim_1_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95_ordination.txt\r\n", - "latent_dim_1_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95_ranks.csv\r\n" + "latent_dim_1_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95_ranks.txt\r\n" ] } ], @@ -38,7 +45,7 @@ "outputs": [], "source": [ "datadir = 'summarydir/latent_dim_1_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95'\n", - "ranks = pd.read_csv(datadir + '_ranks.csv', index_col=0)" + "ranks = pd.read_csv(datadir + '_ranks.txt', index_col=0, sep='\\t')" ] }, { @@ -49,11 +56,12 @@ { "data": { "text/plain": [ - "(2,3-dihydroxy-3-methylbutanoate) 0.000000\n", - "(2,5-diaminohexanoate) 0.030698\n", - "(3-hydroxypyridine) 0.030141\n", - "(3-methyladenine) 0.032587\n", - "(4-oxoproline) 0.036312\n", + "featureid\n", + "(2,3-dihydroxy-3-methylbutanoate) -5.243021\n", + "(2,5-diaminohexanoate) -1.290612\n", + "(3-hydroxypyridine) 0.002373\n", + "(3-methyladenine) 0.971289\n", + "(4-oxoproline) 2.978444\n", "Name: rplo 1 (Cyanobacteria), dtype: float64" ] }, @@ -63,7 +71,7 @@ } ], "source": [ - "ranks.loc['rplo 1 (Cyanobacteria)'].head()" + "ranks['rplo 1 (Cyanobacteria)'].head()" ] }, { @@ -84,8 +92,8 @@ "metadata": {}, "outputs": [], "source": [ - "idx = ranks.loc['rplo 1 (Cyanobacteria)'] > 0 \n", - "detected_molecules = set(ranks.columns[idx])" + "idx = ranks['rplo 1 (Cyanobacteria)'] > 0 \n", + "detected_molecules = set(ranks.index[idx])" ] }, { @@ -168,19 +176,20 @@ { "data": { "text/plain": [ - "xanthine 0.030167\n", - "guanine 0.033443\n", - "(N6-acetyl-lysine) 0.031525\n", - "cytosine 0.033451\n", - "adenosine 0.038334\n", - "(3-methyladenine) 0.032587\n", - "4-guanidinobutanoate 0.031690\n", - "uracil 0.032330\n", - "hypoxanthine 0.032204\n", - "7-methyladenine 0.029252\n", - "N-acetylornithine 0.033029\n", - "succinate 0.033501\n", - "adenine 0.032958\n", + "featureid\n", + "cytosine 3.238725\n", + "xanthine 0.687712\n", + "N-acetylornithine 1.247421\n", + "uracil 1.778591\n", + "adenine 4.983674\n", + "(N6-acetyl-lysine) 4.423469\n", + "4-guanidinobutanoate 4.031901\n", + "guanine 3.107524\n", + "hypoxanthine 0.666798\n", + "7-methyladenine 0.302561\n", + "succinate 0.893106\n", + "(3-methyladenine) 0.971289\n", + "adenosine 4.995941\n", "Name: rplo 1 (Cyanobacteria), dtype: float64" ] }, @@ -190,7 +199,7 @@ } ], "source": [ - "ranks.loc['rplo 1 (Cyanobacteria)'].loc[microcoleus_metabolites]" + "ranks['rplo 1 (Cyanobacteria)'].loc[microcoleus_metabolites]" ] }, { @@ -202,6 +211,107 @@ "assert len(detected_molecules & microcoleus_metabolites) == 13" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# qiime2 check" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "import qiime2\n" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n", + "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n" + ] + } + ], + "source": [ + "ranks = qiime2.Artifact.load('ranks.qza').view(pd.DataFrame)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "idx = ranks['rplo 1 (Cyanobacteria)'] > 0 \n", + "detected_molecules = set(ranks.index[idx])\n", + "assert len(detected_molecules & microcoleus_metabolites) == 13" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "featureid\n", + "cytosine 3.147861\n", + "xanthine 0.842640\n", + "N-acetylornithine 1.281711\n", + "uracil 1.990830\n", + "adenine 5.086781\n", + "(N6-acetyl-lysine) 4.530147\n", + "4-guanidinobutanoate 4.027770\n", + "guanine 3.129724\n", + "hypoxanthine 0.713017\n", + "7-methyladenine 0.492983\n", + "succinate 0.805413\n", + "(3-methyladenine) 1.006567\n", + "adenosine 4.987744\n", + "Name: rplo 1 (Cyanobacteria), dtype: float64" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ranks['rplo 1 (Cyanobacteria)'].loc[microcoleus_metabolites]" + ] + }, { "cell_type": "code", "execution_count": null, diff --git a/examples/soils/metabolites.biom b/examples/soils/metabolites.biom new file mode 100644 index 0000000..4bd76fb Binary files /dev/null and b/examples/soils/metabolites.biom differ diff --git a/examples/soils/microbes.biom b/examples/soils/microbes.biom new file mode 100644 index 0000000..0791df7 Binary files /dev/null and b/examples/soils/microbes.biom differ diff --git a/examples/soils/run.sh b/examples/soils/run.sh index c92d40e..82c8442 100644 --- a/examples/soils/run.sh +++ b/examples/soils/run.sh @@ -6,3 +6,15 @@ mmvec paired-omics\ --latent-dim 1 \ --learning-rate 1e-3 \ --epochs 3000 + +qiime tools import --input-path microbes.biom --output-path microbes.biom.qza --type FeatureTable[Frequency] +qiime tools import --input-path metabolites.biom --output-path metabolites.biom.qza --type FeatureTable[Frequency] + +qiime mmvec paired-omics \ + --i-microbes microbes.biom.qza \ + --i-metabolites metabolites.biom.qza \ + --p-epochs 100 \ + --p-learning-rate 1e-3 \ + --o-conditionals ranks.qza \ + --o-conditional-biplot biplot.qza \ + --verbose diff --git a/img/mmvec.png b/img/mmvec.png new file mode 100644 index 0000000..376cf07 Binary files /dev/null and b/img/mmvec.png differ diff --git a/mmvec/__init__.py b/mmvec/__init__.py index 236cf4f..4f22ff9 100644 --- a/mmvec/__init__.py +++ b/mmvec/__init__.py @@ -1,5 +1,5 @@ from .heatmap import _heatmap_choices, _cmaps -__version__ = "1.0.0" +__version__ = "1.0.1" __all__ = ['_heatmap_choices', '_cmaps'] diff --git a/mmvec/multimodal.py b/mmvec/multimodal.py index ebb94e6..058986c 100644 --- a/mmvec/multimodal.py +++ b/mmvec/multimodal.py @@ -11,7 +11,7 @@ class MMvec(object): def __init__(self, u_mean=0, u_scale=1, v_mean=0, v_scale=1, batch_size=50, latent_dim=3, - learning_rate=0.1, beta_1=0.9, beta_2=0.95, + learning_rate=0.1, beta_1=0.8, beta_2=0.9, clipnorm=10., device_name='/cpu:0', save_path=None): """ Build a tensorflow model for microbe-metabolite vectors @@ -205,6 +205,16 @@ def __call__(self, session, trainX, trainY, testX, testY): tf.global_variables_initializer().run() + def ranks(self): + modelU = np.hstack( + (np.ones((self.U.shape[0], 1)), self.Ubias, self.U)) + modelV = np.vstack( + (self.Vbias, np.ones((1, self.V.shape[1])), self.V)) + + res = np.hstack((np.zeros((self.U.shape[0], 1)), modelU @ modelV)) + res = res - res.mean(axis=1).reshape(-1, 1) + return res + def fit(self, epoch=10, summary_interval=1000, checkpoint_interval=3600, testX=None, testY=None): """ Fits the model. diff --git a/mmvec/q2/_method.py b/mmvec/q2/_method.py index 515f4ca..0db3393 100644 --- a/mmvec/q2/_method.py +++ b/mmvec/q2/_method.py @@ -56,25 +56,12 @@ def paired_omics(microbes: biom.Table, loss, cv = model.fit(epoch=epochs, summary_interval=summary_interval) - U, V = model.U, model.V + ranks = pd.DataFrame(model.ranks(), index=train_microbes_df.columns, + columns=train_metabolites_df.columns) - U_ = np.hstack( - (np.ones((model.U.shape[0], 1)), - model.Ubias.reshape(-1, 1), U) - ) - V_ = np.vstack( - (model.Vbias.reshape(1, -1), - np.ones((1, model.V.shape[1])), V) - ) - - ranks = pd.DataFrame( - np.hstack((np.zeros((model.U.shape[0], 1)), U_ @ V_)), - index=train_microbes_df.columns, - columns=train_metabolites_df.columns) - - ranks = ranks - ranks.mean(axis=1).values.reshape(-1, 1) - ranks = ranks - ranks.mean(axis=0) - u, s, v = svds(ranks, k=latent_dim) + u, s, v = svds(ranks - ranks.mean(axis=0), k=latent_dim) + ranks = ranks.T + ranks.index.name = 'featureid' s = s[::-1] u = u[:, ::-1] v = v[::-1, :] diff --git a/mmvec/q2/_transformer.py b/mmvec/q2/_transformer.py index 846500e..b25a8d6 100644 --- a/mmvec/q2/_transformer.py +++ b/mmvec/q2/_transformer.py @@ -1,3 +1,4 @@ +import qiime2 import pandas as pd from mmvec.q2 import ConditionalFormat @@ -16,3 +17,8 @@ def _2(df: pd.DataFrame) -> ConditionalFormat: ff = ConditionalFormat() df.to_csv(str(ff), sep='\t', header=True, index=True) return ff + + +@plugin.register_transformer +def _3(ff: ConditionalFormat) -> qiime2.Metadata: + return qiime2.Metadata.load(str(ff)) diff --git a/mmvec/q2/tests/test_method.py b/mmvec/q2/tests/test_method.py index 0ce833b..68dc4fd 100644 --- a/mmvec/q2/tests/test_method.py +++ b/mmvec/q2/tests/test_method.py @@ -12,6 +12,7 @@ class TestMMvec(unittest.TestCase): def setUp(self): + np.random.seed(1) res = random_multimodal( num_microbes=8, num_metabolites=8, num_samples=150, latent_dim=2, sigmaQ=2, @@ -40,12 +41,14 @@ def setUp(self): def test_fit(self): np.random.seed(1) tf.reset_default_graph() - latent_dim = 2 tf.set_random_seed(0) + latent_dim = 2 res_ranks, res_biplot = paired_omics( self.microbes, self.metabolites, - epochs=1000, latent_dim=latent_dim + epochs=1000, latent_dim=latent_dim, + min_feature_count=1, learning_rate=0.1 ) + res_ranks = clr_inv(res_ranks.T) s_r, s_p = spearmanr(np.ravel(res_ranks), np.ravel(self.exp_ranks)) self.assertGreater(s_r, 0.5) diff --git a/mmvec/tests/test_multimodal.py b/mmvec/tests/test_multimodal.py index 464ff03..96eb8a8 100644 --- a/mmvec/tests/test_multimodal.py +++ b/mmvec/tests/test_multimodal.py @@ -18,6 +18,7 @@ class TestMMvec(unittest.TestCase): def setUp(self): # build small simulation + np.random.seed(1) res = random_multimodal( num_microbes=8, num_metabolites=8, num_samples=150, latent_dim=2, sigmaQ=2, @@ -49,11 +50,6 @@ def test_fit(self): coo_matrix(self.testX.values), self.testY.values) model.fit(epoch=1000) - modelU = np.hstack( - (np.ones((model.U.shape[0], 1)), model.Ubias, model.U)) - modelV = np.vstack( - (model.Vbias, np.ones((1, model.V.shape[1])), model.V)) - U_ = np.hstack( (np.ones((self.U.shape[0], 1)), self.Ubias, self.U)) V_ = np.vstack( @@ -62,7 +58,7 @@ def test_fit(self): u_r, u_p = spearmanr(pdist(model.U), pdist(self.U)) v_r, v_p = spearmanr(pdist(model.V.T), pdist(self.V.T)) - res = softmax(np.hstack((np.zeros((d1, 1)), modelU @ modelV))) + res = softmax(model.ranks()) exp = softmax(np.hstack((np.zeros((d1, 1)), U_ @ V_))) s_r, s_p = spearmanr(np.ravel(res), np.ravel(exp)) @@ -108,16 +104,11 @@ def test_soils(self): coo_matrix(self.testX.values), self.testY.values) model.fit(epoch=1000) - modelU = np.hstack( - (np.ones((model.U.shape[0], 1)), model.Ubias, model.U)) - modelV = np.vstack( - (model.Vbias, np.ones((1, model.V.shape[1])), model.V)) - ranks = pd.DataFrame( - np.hstack((np.zeros((d1, 1)), modelU @ modelV)), + model.ranks(), index=self.microbes.ids(axis='observation'), columns=self.metabolites.ids(axis='observation')) - ranks = ranks - ranks.mean(axis=1).values.reshape(-1, 1) + microcoleus_metabolites = [ '(3-methyladenine)', '7-methyladenine', '4-guanidinobutanoate', 'uracil', 'xanthine', 'hypoxanthine', '(N6-acetyl-lysine)', diff --git a/scripts/mmvec b/scripts/mmvec index 1164c6d..92a66dd 100644 --- a/scripts/mmvec +++ b/scripts/mmvec @@ -189,15 +189,14 @@ def paired_omics(microbe_file, metabolite_file, df.to_csv(embeddings_file, sep='\t') # Save to a ranks file - ranks = pd.DataFrame(model.U @ V.T, index=train_microbes_df.columns, + ranks = pd.DataFrame(model.ranks(), index=train_microbes_df.columns, columns=train_metabolites_df.columns) - ranks = ranks - ranks.mean(axis=1).values.reshape(-1, 1) + + u, s, v = svds(ranks - ranks.mean(axis=0), k=latent_dim) + ranks = ranks.T ranks.index.name = 'featureid' ranks.to_csv(ranks_file, sep='\t') - # Save to an ordination file - ranks = ranks - ranks.mean(axis=0) - u, s, v = svds(ranks, k=latent_dim) s = s[::-1] u = u[:, ::-1] v = v[::-1, :]