UMAPs without mitochondria #1658

sjspielman · 2023-01-26T14:33:23Z

This PR explores, as part of manuscript revisions, how dimension reduction might change if we do not include mitochondrial genes, as inspired by this issue: #1601
Specifically, it seems that one sequencing center has some problematic mitochondrial gene expression distributions that may have resulted from using a different kit whose identity may also be lost to the ages.

There are two main changes in this module:

I updated the transcriptomic-dimension-reduction module to include an option to remove mitochondrial genes. I added this run for stranded RSEM with log2 (skipping tnse) in order to generate data for this exploration. But, I did not add this run to the 02 plot script , since we don't really need those.
I added a notebook to transcriptomic-dimension-reduction to explore the mitochondrial expression, first via just looking at FPKMs of mito genes for relevant diagnoses and then via UMAPs made with and without mitochondrial genes. I don't see anything too different here, but welcome thoughts on interpretation. Here's that notebook:
05-seq-center-mitochondrial-genes.nb.html.zip

Note that I had some merge conflicts bringing master in, so I had to open and clean up some the tumor purity scripts (including re-rendering a notebook), and along the way there VS Code ate lots of EOL spaces, so that's why those diffs are here!

…r potential use in other modules

…modules

…group

… generate associated files for the new 'rsem_stranded_no-mito_log' filename_lead value

…erated stranded rsem without mito umap

analyses/transcriptomic-dimension-reduction/01-dimension-reduction.sh

jashapiro

This looks good to me code-wise, but I am a bit concerned about whether the UMAPs are what they should be, given https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/1658/files#r1089064606

I'll wait for confirmation/rerunning before approval...

analyses/transcriptomic-dimension-reduction/scripts/run-dimension-reduction.R

jashapiro · 2023-01-27T15:27:26Z

analyses/transcriptomic-dimension-reduction/05-seq-center-mitochondrial-genes.Rmd

+  tidyr::gather("Kids_First_Biospecimen_ID", 
+                "fpkm", 
+                tidyselect::starts_with("BS_")) %>%
+  dplyr::mutate(log2_fpkm = log2(fpkm+1)) %>%


Just want to note my minor discomfort with +1 as a zero-correction for FPKM, but it does not really matter in this case.

For better or worse, this is all up in OpenPBTA :/

Co-authored-by: Joshua Shapiro <[email protected]>

jashapiro

I was about to approve this and say that this looks good to me, but then I realized that this may not actually be sufficient for "removing" the mitochondrial genes. The FPKM calculation will still be using the mitochondrial genes in the denominator, which will affect the expression values for all of the other genes before we get to this point. So we may want to remove the mitochondrial genes from the count files, then recalculate FPKM.

I think we can do this using the count matrices that we have, though I believe the original the FPKM calculation occurs upstream, so we might not get precisely the same results if we reimplement the FPKM calculations. (So we should recalculate with and without MT) Before attempting this, we should probably discuss whether it is the right thing to spend time on at this moment.

analyses/transcriptomic-dimension-reduction/05-seq-center-mitochondrial-genes.Rmd

…chondrial-genes.Rmd Co-authored-by: Joshua Shapiro <[email protected]>

jashapiro · 2023-02-06T18:10:36Z

I was about to approve this and say that this looks good to me, but then I realized that this may not actually be sufficient for "removing" the mitochondrial genes. The FPKM calculation will still be using the mitochondrial genes in the denominator, which will affect the expression values for all of the other genes before we get to this point. So we may want to remove the mitochondrial genes from the count files, then recalculate FPKM.

I think we can do this using the count matrices that we have, though I believe the original the FPKM calculation occurs upstream, so we might not get precisely the same results if we reimplement the FPKM calculations. (So we should recalculate with and without MT) Before attempting this, we should probably discuss whether it is the right thing to spend time on at this moment.

My main followup thought here is that we can probably do a rough version of this using the existing FPKM or TPM matrices: With TPM, I think we can re-sum the TPM values for each sample, excluding mitochondrial genes, divide by the sum/10^6 to rescale. That won't be quite right for FPKM, but it will probably be close enough. Still, I'd probably start with TPM for this notebook (calculating UMAP from the original TPM and then from the mito-excluded & rescaled TPM) as we are just looking to see if there is an effect of removing the mitochondrial genes. If we feel we need to go back to FPKM calculations later for better consistency with previous analyses, we can still do that.

… collapse though

…plore those UMAPs

sjspielman · 2023-02-07T21:57:14Z

@jashapiro We have some TPM UMAPs 😄
Here's the re-rendered notebook exploring the results. The plots don't look wildly different from FPKM which is a nice little sanity check to see. Overall trends look roughly the same with(out) mitochondrial genes.
05-seq-center-mitochondrial-genes.nb.html.zip

For this review, please let* (edit) me know in particular any feedback on how I did the TPM prep (scripts being called from right places, and scratch/ export choice). I also kept the COUNT_THRESHOLD bash variable for filtering out low counts at 100, which seems more or less fine in the end.

jashapiro

This looks fine. Glad to see that renomalizing doesn't really have much of an effect, and I like the additional comment you added about how this doesn't correct for everything that we might see from protocol differences.

The html file in the repo is not up to date, however, so that needs to be added before approving. I don't know if other data files are up to date, but I image a full rerun will address both.

Minor comments that do not be to be addressed here:
I probably wouldn't have created a separate script for the TPM matrix filtering, as you already had most of the filtering code in the main script, but I don't see any major problem for this check.

If this were more than a one-off, I might also suggest not doing the gather/spread, but instead converting to a matrix and using matrix ops for efficiency, but for this it really doesn't matter.

It also seems odd to me that the umap tsv files have all of the metadata in them, but not worth the effort to change here.

jashapiro · 2023-02-07T22:13:16Z

analyses/transcriptomic-dimension-reduction/01-dimension-reduction.sh

+Rscript --vanilla scripts/run-dimension-reduction.R \
+  --expression ${NOMITO_TPM_FILE} \
+  --metadata ${METADATA} \
+  --remove_mito_genes \


I hope this isn't needed here? The mito genes should have already been removed, correct?

Ah yes, and the flag in general is not needed at all anymore if we're not doing this with FPKM. I'll sort that all out.

analyses/transcriptomic-dimension-reduction/scripts/prepare-tpm-for-umap.R

…m-for-umap.R Co-authored-by: Joshua Shapiro <[email protected]>

…t is no longer used

sjspielman · 2023-02-09T19:02:32Z

@jashapiro As of now, CI is passing through this module, so it's ready for another look! I updated the module to wholesale remove the remove_mito_genes flag since it's not needed at all anymore. I also re-ran the module to properly generate the HTMLs with the full data (not the CI data as was there before, woops!). Here’s the notebook:
05-seq-center-mitochondrial-genes.nb.html.zip

Since the tumor purity had a diff (see my last bit here #1658 (comment)), I also decided just to be safe to re-run that module as well. All the notebook diffs are HTML miscellany.

jashapiro

LGTM. Still not 100% sure what to do with this information, but that is separate from the analysis itself.

sjspielman added 20 commits November 3, 2022 12:06

Module readme

6bb2e6d

Initiated tumor purity module. Added run script and included in CI

044b928

module to readme

0170550

bash scripts in fact do not use .R extensions

04784e0

Added filtering distributions

f52472d

Merge branch 'master' into initiate-tumor-purity-module

a1495ec

Merge branch 'master' into initiate-tumor-purity-module

4cb29e8

Updated tumor purity to include extraction_type and output results fo…

7b2109c

…r potential use in other modules

Add results TSV into readme since this is likely to be used by other …

174b90c

…modules

Added another subsection and result file for thresholding PER cancer …

260a123

…group

Add option to remove mitochondrial genes from dimension reduction and…

f91caff

… generate associated files for the new 'rsem_stranded_no-mito_log' filename_lead value

Accidentally had used polya and found other bug. Fixed code and regen…

a9fca41

…erated stranded rsem without mito umap

notebook to plot UMAP without mito as well as mito fpkm jitter

a27c285

plot styling

b12383e

merge in master and fix conflicts

977a86c

remove sneaky zipped html

b818e98

remove old result files from when I started this branch

f6fea2f

add notebook to module bash scripts

b724636

small title tweak in nb

845bd18

No need to make the plots with this data, and no need to run t-sne

71523e3

sjspielman requested review from jaclyn-taroni and jashapiro January 26, 2023 14:33

sjspielman added the revision Related to manuscript revisions label Jan 26, 2023

jashapiro reviewed Jan 27, 2023

View reviewed changes

analyses/transcriptomic-dimension-reduction/01-dimension-reduction.sh Outdated Show resolved Hide resolved

jashapiro reviewed Jan 27, 2023

View reviewed changes

sjspielman and others added 5 commits January 30, 2023 10:27

Merge branch 'master' into umap-sans-mito

1da693c

Apply suggestions from code review

677b9f4

Co-authored-by: Joshua Shapiro <[email protected]>

add conclusions and re-render

54ef86d

updated result file with properly used flag

cc82609

small comment update

efb1936

sjspielman requested a review from jashapiro January 30, 2023 16:03

jashapiro reviewed Jan 30, 2023

View reviewed changes

analyses/transcriptomic-dimension-reduction/05-seq-center-mitochondrial-genes.Rmd Outdated Show resolved Hide resolved

Update analyses/transcriptomic-dimension-reduction/05-seq-center-mito…

f3deb28

…chondrial-genes.Rmd Co-authored-by: Joshua Shapiro <[email protected]>

sjspielman added 7 commits February 7, 2023 12:07

Merge branch 'master' into umap-sans-mito

a89ec72

merge reintroduced a straggling backtick that is now re-purged

7fb7286

script for TPM collapsing and filtering to nomito. We may not need to…

62f57ca

… collapse though

updated script to no longer collapse

60b71b7

We now have TPM results, not collapsed, and an updated notebook to ex…

5d0aacc

…plore those UMAPs

woops they were both nomito. fixed, and conclusions are the same

738e303

Add tpm without mito to ci

88a994f

sjspielman requested a review from jashapiro February 7, 2023 21:57

jashapiro reviewed Feb 8, 2023

View reviewed changes

analyses/transcriptomic-dimension-reduction/scripts/prepare-tpm-for-umap.R Outdated Show resolved Hide resolved

sjspielman and others added 8 commits February 9, 2023 10:17

Merge branch 'master' into umap-sans-mito

752850c

Update analyses/transcriptomic-dimension-reduction/scripts/prepare-tp…

2388b73

…m-for-umap.R Co-authored-by: Joshua Shapiro <[email protected]>

Removed remove_mito_genes flag and re-run module with correct data

9a8be2a

Add 05 notebook to README

377375c

need to remove flag from CI script

7d8023e

Removed description of collapse-rnaseq approach from comments since i…

6879f0d

…t is no longer used

ACTUALLY use the right data - the scratch was outdated, now it is fixed

05d53f4

reran to be safe

1e73b21

Merge branch 'master' into umap-sans-mito

bc61fcb

sjspielman requested review from jashapiro and removed request for jaclyn-taroni February 9, 2023 19:02

jashapiro approved these changes Feb 10, 2023

View reviewed changes

sjspielman merged commit f519494 into AlexsLemonade:master Feb 15, 2023

sjspielman deleted the umap-sans-mito branch February 15, 2023 14:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UMAPs without mitochondria #1658

UMAPs without mitochondria #1658

sjspielman commented Jan 26, 2023

jashapiro left a comment

jashapiro Jan 27, 2023

sjspielman Jan 30, 2023

jashapiro left a comment

jashapiro commented Feb 6, 2023

sjspielman commented Feb 7, 2023 •

edited

Loading

jashapiro left a comment

jashapiro Feb 7, 2023

sjspielman Feb 9, 2023

sjspielman commented Feb 9, 2023

jashapiro left a comment

UMAPs without mitochondria #1658

UMAPs without mitochondria #1658

Conversation

sjspielman commented Jan 26, 2023

jashapiro left a comment

Choose a reason for hiding this comment

jashapiro Jan 27, 2023

Choose a reason for hiding this comment

sjspielman Jan 30, 2023

Choose a reason for hiding this comment

jashapiro left a comment

Choose a reason for hiding this comment

jashapiro commented Feb 6, 2023

sjspielman commented Feb 7, 2023 • edited Loading

jashapiro left a comment

Choose a reason for hiding this comment

jashapiro Feb 7, 2023

Choose a reason for hiding this comment

sjspielman Feb 9, 2023

Choose a reason for hiding this comment

sjspielman commented Feb 9, 2023

jashapiro left a comment

Choose a reason for hiding this comment

sjspielman commented Feb 7, 2023 •

edited

Loading