-
Notifications
You must be signed in to change notification settings - Fork 74
Conversation
…r potential use in other modules
… generate associated files for the new 'rsem_stranded_no-mito_log' filename_lead value
…erated stranded rsem without mito umap
analyses/transcriptomic-dimension-reduction/01-dimension-reduction.sh
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me code-wise, but I am a bit concerned about whether the UMAPs are what they should be, given https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/1658/files#r1089064606
I'll wait for confirmation/rerunning before approval...
analyses/transcriptomic-dimension-reduction/scripts/run-dimension-reduction.R
Outdated
Show resolved
Hide resolved
tidyr::gather("Kids_First_Biospecimen_ID", | ||
"fpkm", | ||
tidyselect::starts_with("BS_")) %>% | ||
dplyr::mutate(log2_fpkm = log2(fpkm+1)) %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to note my minor discomfort with +1
as a zero-correction for FPKM, but it does not really matter in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better or worse, this is all up in OpenPBTA :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was about to approve this and say that this looks good to me, but then I realized that this may not actually be sufficient for "removing" the mitochondrial genes. The FPKM calculation will still be using the mitochondrial genes in the denominator, which will affect the expression values for all of the other genes before we get to this point. So we may want to remove the mitochondrial genes from the count files, then recalculate FPKM.
I think we can do this using the count matrices that we have, though I believe the original the FPKM calculation occurs upstream, so we might not get precisely the same results if we reimplement the FPKM calculations. (So we should recalculate with and without MT) Before attempting this, we should probably discuss whether it is the right thing to spend time on at this moment.
analyses/transcriptomic-dimension-reduction/05-seq-center-mitochondrial-genes.Rmd
Outdated
Show resolved
Hide resolved
…chondrial-genes.Rmd Co-authored-by: Joshua Shapiro <[email protected]>
My main followup thought here is that we can probably do a rough version of this using the existing FPKM or TPM matrices: With TPM, I think we can re-sum the TPM values for each sample, excluding mitochondrial genes, divide by the sum/10^6 to rescale. That won't be quite right for FPKM, but it will probably be close enough. Still, I'd probably start with TPM for this notebook (calculating UMAP from the original TPM and then from the mito-excluded & rescaled TPM) as we are just looking to see if there is an effect of removing the mitochondrial genes. If we feel we need to go back to FPKM calculations later for better consistency with previous analyses, we can still do that. |
…plore those UMAPs
@jashapiro We have some TPM UMAPs 😄 For this review, please let* (edit) me know in particular any feedback on how I did the TPM prep (scripts being called from right places, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine. Glad to see that renomalizing doesn't really have much of an effect, and I like the additional comment you added about how this doesn't correct for everything that we might see from protocol differences.
The html file in the repo is not up to date, however, so that needs to be added before approving. I don't know if other data files are up to date, but I image a full rerun will address both.
Minor comments that do not be to be addressed here:
I probably wouldn't have created a separate script for the TPM matrix filtering, as you already had most of the filtering code in the main script, but I don't see any major problem for this check.
If this were more than a one-off, I might also suggest not doing the gather/spread, but instead converting to a matrix and using matrix ops for efficiency, but for this it really doesn't matter.
It also seems odd to me that the umap tsv files have all of the metadata in them, but not worth the effort to change here.
Rscript --vanilla scripts/run-dimension-reduction.R \ | ||
--expression ${NOMITO_TPM_FILE} \ | ||
--metadata ${METADATA} \ | ||
--remove_mito_genes \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope this isn't needed here? The mito genes should have already been removed, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, and the flag in general is not needed at all anymore if we're not doing this with FPKM. I'll sort that all out.
analyses/transcriptomic-dimension-reduction/scripts/prepare-tpm-for-umap.R
Outdated
Show resolved
Hide resolved
analyses/transcriptomic-dimension-reduction/scripts/prepare-tpm-for-umap.R
Outdated
Show resolved
Hide resolved
…m-for-umap.R Co-authored-by: Joshua Shapiro <[email protected]>
…t is no longer used
@jashapiro As of now, CI is passing through this module, so it's ready for another look! I updated the module to wholesale remove the Since the tumor purity had a diff (see my last bit here #1658 (comment)), I also decided just to be safe to re-run that module as well. All the notebook diffs are HTML miscellany. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Still not 100% sure what to do with this information, but that is separate from the analysis itself.
This PR explores, as part of manuscript revisions, how dimension reduction might change if we do not include mitochondrial genes, as inspired by this issue: #1601
Specifically, it seems that one sequencing center has some problematic mitochondrial gene expression distributions that may have resulted from using a different kit whose identity may also be lost to the ages.
There are two main changes in this module:
I updated the
transcriptomic-dimension-reduction
module to include an option to remove mitochondrial genes. I added this run for stranded RSEM with log2 (skipping tnse) in order to generate data for this exploration. But, I did not add this run to the 02 plot script , since we don't really need those.I added a notebook to
transcriptomic-dimension-reduction
to explore the mitochondrial expression, first via just looking at FPKMs of mito genes for relevant diagnoses and then via UMAPs made with and without mitochondrial genes. I don't see anything too different here, but welcome thoughts on interpretation. Here's that notebook:05-seq-center-mitochondrial-genes.nb.html.zip
Note that I had some merge conflicts bringing
master
in, so I had to open and clean up some the tumor purity scripts (including re-rendering a notebook), and along the way there VS Code ate lots of EOL spaces, so that's why those diffs are here!