Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Updated analysis: Investigate handling of t-SNE in transcriptomic-dimension-reduction module #716

Open
cbethell opened this issue Jun 10, 2020 · 5 comments

Comments

@cbethell
Copy link
Contributor

What analysis module should be updated and why?

As discovered in PR #704, the t-SNE results in analyses/transcriptomic-dimension-reduction/results do not appear to be reproducible as they change across operating systems (when ran with a seed set in Docker) but remain the same for each specific individual OS.

What changes need to be made? Please provide enough detail for another participant to make the update.

As noted on PR #704, the following has been determined:

  • The seed function must be working as intended because the PCA and t-SNE results stay the same for each individual OS
  • The input files have not changed since the dimension reduction script was last ran and merged into master
  • There only appears to be issues reproducing the PCA and t-SNE result files (and not the UMAP result files)
  • There has been no evidence of updates to the Rtsne package

That being said, when looking into this issue, one might look into:

  • Whether or not various versions of Docker can cause differentiating results across operating systems
  • The prcomp function as the Rtsne function performs an initial reduction using prcomp

What input data should be used? Which data were used in the version being updated?

No additional input data should be needed.

When do you expect the revised analysis will be completed?

~ 2 days

Who will complete the updated analysis?

Likely someone at the CCDL

@sjspielman
Copy link
Member

The OS's in question are presumably mac vs linux, yes? A possible culprit could be clang vs gcc, or similar, based on personal experience only. Will start some sleuthing over here.

@jashapiro
Copy link
Member

The OS's in question are presumably mac vs linux, yes? A possible culprit could be clang vs gcc, or similar, based on personal experience only. Will start some sleuthing over here.

Not quite so simply. As far as I am aware, the differences are all occurring on Macs, but more to the point it is all within Docker images, so compiler versions should be the same. Should. It's a mystery.

@sjspielman
Copy link
Member

sjspielman commented Jun 10, 2020

It's a mystery.

Until we solve, I will blame Bioconductor, solely for personal comfort.

@sjspielman
Copy link
Member

sjspielman commented Jun 11, 2020

I don't think the Docker version itself should make a difference, but for what it's worth I'm rebuilding my image now on this docker version -
Screen Shot 2020-06-11 at 9 28 09 AM

Edit: As needed, here are docker's release notes. I am using the most up-to-date version of Desktop released on 5/27/20. The score files in master were merged in on 4/4/20, so had to have used a different Docker release. It would be extra bad if docker itself is leading to this discrepancy, and I have to assume this is not the cause (otherwise too depressing). Still worth ruling out.

@sjspielman
Copy link
Member

sjspielman commented Jun 11, 2020

Appoach : two docker using the aforementioned Desktop Docker version, builds as:

docker build -t pbtarocker-cache --pull
docker build -t pbtarocker-nocache --pull --no-cache

They give the exact same values as one another, but their PCA and tSNE scores differ from master. UMAP is the same as master.

Conclusion: Likely unrelated to docker caching. This is expected and good. Next up: related to docker version? Will go download a slightly older Desktop docker and rebuild (from cache!) and check it out there. Goal: this also isn't the culprit. EDIT: My computer will not allow me to download any older version from March or earlier due to security risks. I can't investigate this one.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants