Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] Add PyTorch loaders article release #1214

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

pablo-gar
Copy link
Collaborator

@pablo-gar pablo-gar commented Jun 28, 2024

Images don't render in markdown, they are encoded for the Myst parser

@pablo-gar pablo-gar requested a review from ebezzi June 28, 2024 20:16
pablo-gar and others added 2 commits June 28, 2024 13:48
Co-authored-by: Emanuele Bezzi <[email protected]>
docs/articles/2024/20240702-pytorch.md Outdated Show resolved Hide resolved
docs/articles/2024/20240702-pytorch.md Outdated Show resolved Hide resolved
docs/articles/2024/20240702-pytorch.md Outdated Show resolved Hide resolved

We have made improvements to the loaders to reduce the amount of data transformations required from data fetching to model training. One such important change is to encode the expression data as a dense matrix immediately after the data is retrieved from disk/cloud.

In our benchmarks, we found that densifying data increases training speed ~3X while maintaining relatively constant memory usage (Figure 3). However we still allow users to decide whether to process the expression data in sparse or dense format via the #TODO ask ebezzi to include name of parameter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter is method, but I believe @ryan-williams wanted to change it?

pablo-gar and others added 3 commits June 28, 2024 14:13
Copy link

codecov bot commented Jun 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.17%. Comparing base (fc0281b) to head (a33479a).
Report is 10 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1214      +/-   ##
==========================================
+ Coverage   91.11%   91.17%   +0.06%     
==========================================
  Files          77       77              
  Lines        5922     5963      +41     
==========================================
+ Hits         5396     5437      +41     
  Misses        526      526              
Flag Coverage Δ
unittests 91.17% <ø> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


We have made improvements to the loaders to reduce the amount of data transformations required from data fetching to model training. One such important change is to encode the expression data as a dense matrix immediately after the data is retrieved from disk/cloud.

In our benchmarks, we found that densifying data increases training speed ~3X while maintaining relatively constant memory usage (Figure 3). However we still allow users to decide whether to process the expression data in sparse or dense format via the #TODO ask ebezzi to include name of parameter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In our benchmarks, we found that densifying data increases training speed ~3X while maintaining relatively constant memory usage (Figure 3). However we still allow users to decide whether to process the expression data in sparse or dense format via the #TODO ask ebezzi to include name of parameter.
In our benchmarks, we found that densifying data increases training speed ~3X while maintaining relatively constant memory usage (Figure 3). However we still allow users to decide whether to process the expression data in sparse or dense format via the `method` parameter.

:align: center
:figwidth: 80%

**Figure 4. Trial scVI training run with default parameters of the Census Pytorch loaders, highlighting increased speed of dense vs sparse data processing.** Training was done on X mouse cells for 1 epoch in X EC2 instance #TODO ask ebezzi for further details.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Figure 4. Trial scVI training run with default parameters of the Census Pytorch loaders, highlighting increased speed of dense vs sparse data processing.** Training was done on X mouse cells for 1 epoch in X EC2 instance #TODO ask ebezzi for further details.
**Figure 4. Trial scVI training run with default parameters of the Census Pytorch loaders, highlighting increased speed of dense vs sparse data processing.** Training was done on 5684805 mouse cells for 1 epoch on a g4dn.16xlarge EC2 machine.


For maximum flexibility, users can provide custom encoders for the cell metadata enabling custom transformations or interactions between different metadata variables.

To use custom encoders you need to instantiate the desired encoder via the `Encoder` (#TODO ebezzi to insert a link to the docs) class and pass it to the `encoders` parameter of the `ExperimentDataPipe`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To use custom encoders you need to instantiate the desired encoder via the `Encoder` (#TODO ebezzi to insert a link to the docs) class and pass it to the `encoders` parameter of the `ExperimentDataPipe`.
To use custom encoders you need to instantiate the desired encoder via the [Encoder](https://chanzuckerberg.github.io/cellxgene-census/_autosummary/cellxgene_census.experimental.ml.encoders.Encoder.html#cellxgene_census.experimental.ml.encoders.Encoder) class and pass it to the `encoders` parameter of the `ExperimentDataPipe`.

Note that this link will be broken until we release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants