-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checking for ratio values #303
Comments
Hi @pavlo888 Thanks for using Qurro! To clarify, you are interested in the numerator and denominator values of a given log-ratio, correct? In this case, extracting these values may depend on how you've selected the features. If you searched by taxonomy then you should be able to use the If you are selecting features a different way (e.g. autoselection, manual, etc.), to my knowledge there is no way to extract these sums directly from Qurro. What you could do is download the selected features using the "Export Selected Features" option and then calculate the numerator and denominator yourself. The formula Qurro uses to calculate log-ratios is: so you can just calculate the sum of numerator features as well as the sum of denominator features. When @fedarko gets to this he may also have some advice 😄 . |
Hi @gibsramen That's exactly what I did! I searched based on taxonomy. I see the qarcoal command is available as a python script but is it also available in the qiime2 plug-in of Qurro? Cheers, |
Yes, there is a Qiime2 implementation of the qarcoal command as well. An example usage can be seen in the qarcoal example notebook and I've reproduced the example Qiime2 command below.
|
Hi @pavlo888, Thanks for the kind comments! @gibsramen is correct -- aside from using Qarcoal to replicate a taxonomy-based selection, there isn't an "easy" way currently to extract the selected summed numerator and denominator values for each sample. We have an open issue to allow plotting the "raw" ratios instead of the log-ratios in #178, but it sounds like what you would like are the actual numerator and denominator summed values (analogous to what Qarcoal gives you). Would a solution where we add this information to the "Export current sample plot data" output file (say, by making each sample have a |
Hi @fedarko, Indeed I am looking for the raw ratios. I am interested in knowing the actual ratios of two specific genera. Can that be done with Qarcoal? I have already run it but I am not sure how to interpret the output. Could you please help me out with this? Cheers, |
There are a couple of ways of doing this. Option 1. All you want to know is the "raw" ratios of two genera (and you don't care about the individual numerator or denominator values)You can just use Qurro to select the log-ratio normally, and then export the selected log-ratios using the import math
# You may need to filter out samples with a NaN or null log-ratio first
sample_plot_data["raw_ratio"] = math.e**(sample_plot_data["Current_Natural_Log_Ratio"]) This is possible because Qurro computes log-ratios, as @gibsramen mentioned above, by just taking Option 2. You want to know the actual numerator and denominator values for each sample (i.e. each "half" of the ratio)You can use Qarcoal for this. You can run Qarcoal with the You can load this QZA into a pandas DataFrame in Python as shown here, and at this point you'll already have the numerator and denominator sum information in the Option 2.1qarcoal_log_ratios_df["raw_ratio"] = qarcoal_log_ratios_df["Num_Sum"] / qarcoal_log_ratios_df["Denom_Sum"] Option 2.2Alternatively, you can also do the following (this is the way we did this with the Qurro sample plot data TSV in "Option 1"): import math
qarcoal_log_ratios_df["raw_ratio"] = math.e**(qarcoal_log_ratios_df["log_ratio"]) In closingAll of these three ways of doing this should give you the same answer (the same "raw ratios"). You may want to try them out to verify for yourself that this is true (there may be slight precision differences, but I doubt they'll be big enough to make a difference). Hope this helps. |
Hi @fedarko Thank you for your reply. I think that the raw ratios obtained with option 2.1 is the output I am looking for. However, I am not very familiar with Python. I have tried running these commands (https://nbviewer.jupyter.org/github/biocore/qurro/blob/master/example_notebooks/qarcoal/qarcoal_example.ipynb#1.B.-Run-Qarcoal!) on Spyder but I get some errors. Could you point me out to a platform where I can easily run the commands suggested? Cheers, |
I haven't used Spyder, but the necessary code to extract the raw ratios should be runnable through any Python interface ( First off, what sort of error(s) are you getting? If you wouldn't mind copying them here, this would help us figure out where things are going wrong (and it'll help people coming here from Google or whatever who might have the same problem). Here's what I think the code to get "raw" ratios from Qarcoal output would look like, in some more detail. This should be run from within a QIIME 2 conda environment. import pandas as pd
from qiime2 import Artifact
# Load the output QIIME 2 artifact from Qarcoal
qarcoal_log_ratios = Artifact.load("your_qarcoal_output.qza")
# Convert the artifact to a pandas DataFrame
qarcoal_log_ratios_df = qarcoal_log_ratios.view(pd.DataFrame)
# Make a new column in the DataFrame, "raw_ratio"
qarcoal_log_ratios_df["raw_ratio"] = qarcoal_log_ratios_df["Num_Sum"] / qarcoal_log_ratios_df["Denom_Sum"]
# Save the Qarcoal output (including the raw_ratio column we just added) to a TSV file
qarcoal_log_ratios_df.to_csv("raw_ratio_info.tsv", sep="\t") This should accomplish what you want, I think. Let us know if this works! |
Hi @fedarko, It worked perfectly!!!! I opened ipython on the terminal while having the qiime2 Conda environment active and I followed your script and it worked great! Thanks a lot! Now, for the interpretation I just wanna make sure I am doing it right. Or what is the correct wording for this type of output? Thank you in advance for your amazing support! |
Glad that worked! Yes, the The reason many compositional data analysis techniques generally use log-ratios instead of just raw ratios is that logarithms symmetrize things between the numerator and denominator around 0:
but
More generally, So, using the log-ratio (rather than just the raw ratio) gives equal weight to the numerator and denominator, making it easier to compare samples (and enabling the use of ordinary statistical tools, e.g. t-tests). To quote this paper, emphasis mine:
|
Hi @fedarko Thanks a lot for the great insight and explanation. From what you have mentioned above, I assume it would be more advisble if I discuss this kind of results based on the log ratio, right? Cheers, |
In general, yes, I would suggest discussing log-ratios rather than just raw ratios. It might seem less intuitive at first, but (in my opinion) the advantages outweigh the disadvantages. If you'd like further background on log vs. non-log ratios, you might want to check out this issue thread and/or Modeling and Analysis of Compositional Data (link), page 14. |
I'm going to close this for now, but please feel free to open a new issue if you have any other questions. Best, |
Hi @fedarko, I was wondering if you have any experience on plotting the results from qurro in a PCA plot? I have a dataframe looking like this: Any idea? I tried using ggbiplot on R but I cannot make it work. Thank you in advance. Cheers, |
Not super sure what you mean; what's your goal with this analysis? I am unclear on how you'd go from Qurro's results (log-ratios) to PCA. The more common use case would be going the opposite way, i.e. using the feature loadings in a PCA biplot (which I think is what was shown in the rank plot in your first post in this thread?) as input to Qurro to guide the selection of log-ratios that differ across sample types. I guess you could do something like select two log-ratios and then use those as the axes in a scatterplot, which would probably look kinda like a PCA, but I'm not sure that would be more meaningful than just showing two box (or jitter/violin) plots of the different log-ratios. |
Hi @fedarko, Yes indeed. That is exactly what I got. Apologies for not explaining myself clearly. I took the axes from two log ratios and put them in a scatterplot in order to obtain a PCA. My main goal is to shown a bit clearer and in a summarized way the families that are coupled with the differential log ratios. Do you think this would be a good approach? Cheers, |
I guess that could be useful -- I remember similar scatterplots of two log-ratios were talked about by @mortonjt in the context of microbe-metabolite datasets (biocore/mmvec#76) a while back. Although I don't think that a scatterplot between two log-ratios can be called a PCA -- it would just be a normal scatterplot (although it would be interpretable, kind of, as a basic form of "dimensionality reduction"). It should be possible to plot the scatterplot in pretty much any plotting software (ggplot / matplotlib / etc.), I think? Looking at the data you posted earlier, I am a bit confused: it seems like you have coordinates defined for features, not for samples. If you want to make a scatterplot in the way described above, I think the way to do that (assuming you want to use Qurro to select the log-ratios) is to select one log-ratio in Qurro, export it using the |
Hi @fedarko, I am a bit confused by the last thing you suggested, but I have built the "PCA" and it looks like this: Personally, for me this would work. Thanks a lot for your help! |
Hi,
First of all, great plug-in!!!! I think Qurro is a really powerful tool and I have used it successfully so far. However, I wanted to know if there is a way to know the actual value ratios i.e. 1 to 3 or 1 to 1 when I select two taxa?
In the attached file I selected two taxa and I obtained the ratios. Is there a way to know what the actual ratios are?
I hope you understand my inquiry. Thank you in advance for your help!
Cheers,
Pablo
The text was updated successfully, but these errors were encountered: