[docs] Add PyTorch loaders article release #1214

pablo-gar · 2024-06-28T20:16:14Z

Images don't render in markdown, they are encoded for the Myst parser

docs/articles/2024/20240702-pytorch.md

Co-authored-by: Emanuele Bezzi <[email protected]>

docs/articles/2024/20240702-pytorch.md

ebezzi · 2024-06-28T21:08:59Z

docs/articles/2024/20240702-pytorch.md

+
+We have made improvements to the loaders to reduce the amount of data transformations required from data fetching to model training. One such important change is to encode the expression data as a dense matrix immediately after the data is retrieved from disk/cloud. 
+
+In our benchmarks, we found that densifying data increases training speed ~3X while maintaining relatively constant memory usage (Figure 3). However we still allow users to decide whether to process the expression data in sparse or dense format via the #TODO ask ebezzi to include name of parameter. 


The parameter is method, but I believe @ryan-williams wanted to change it?

Co-authored-by: Emanuele Bezzi <[email protected]>

codecov · 2024-06-28T23:09:44Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.17%. Comparing base (fc0281b) to head (a33479a).
Report is 10 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1214      +/-   ##
==========================================
+ Coverage   91.11%   91.17%   +0.06%     
==========================================
  Files          77       77              
  Lines        5922     5963      +41     
==========================================
+ Hits         5396     5437      +41     
  Misses        526      526

Flag	Coverage Δ
unittests	`91.17% <ø> (+0.06%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ebezzi · 2024-07-03T21:46:19Z

docs/articles/2024/20240702-pytorch.md

+
+We have made improvements to the loaders to reduce the amount of data transformations required from data fetching to model training. One such important change is to encode the expression data as a dense matrix immediately after the data is retrieved from disk/cloud.
+
+In our benchmarks, we found that densifying data increases training speed ~3X while maintaining relatively constant memory usage (Figure 3). However we still allow users to decide whether to process the expression data in sparse or dense format via the #TODO ask ebezzi to include name of parameter.


Suggested change

In our benchmarks, we found that densifying data increases training speed ~3X while maintaining relatively constant memory usage (Figure 3). However we still allow users to decide whether to process the expression data in sparse or dense format via the #TODO ask ebezzi to include name of parameter.

In our benchmarks, we found that densifying data increases training speed ~3X while maintaining relatively constant memory usage (Figure 3). However we still allow users to decide whether to process the expression data in sparse or dense format via the `method` parameter.

ebezzi · 2024-07-03T21:57:26Z

docs/articles/2024/20240702-pytorch.md

+:align: center
+:figwidth: 80%
+
+**Figure 4. Trial scVI training run with default parameters of the Census Pytorch loaders, highlighting increased speed of dense vs sparse data processing.** Training was done on X mouse cells for 1 epoch in X EC2 instance #TODO ask ebezzi for further details.


Suggested change

**Figure 4. Trial scVI training run with default parameters of the Census Pytorch loaders, highlighting increased speed of dense vs sparse data processing.** Training was done on X mouse cells for 1 epoch in X EC2 instance #TODO ask ebezzi for further details.

**Figure 4. Trial scVI training run with default parameters of the Census Pytorch loaders, highlighting increased speed of dense vs sparse data processing.** Training was done on 5684805 mouse cells for 1 epoch on a g4dn.16xlarge EC2 machine.

ebezzi · 2024-07-03T23:14:45Z

docs/articles/2024/20240702-pytorch.md

+
+For maximum flexibility, users can provide custom encoders for the cell metadata enabling custom transformations or interactions between different metadata variables.
+
+To use custom encoders you need to instantiate the desired encoder via the `Encoder` (#TODO ebezzi to insert a link to the docs) class and pass it to the `encoders` parameter of the `ExperimentDataPipe`.


Suggested change

To use custom encoders you need to instantiate the desired encoder via the `Encoder` (#TODO ebezzi to insert a link to the docs) class and pass it to the `encoders` parameter of the `ExperimentDataPipe`.

To use custom encoders you need to instantiate the desired encoder via the [Encoder](https://chanzuckerberg.github.io/cellxgene-census/_autosummary/cellxgene_census.experimental.ml.encoders.Encoder.html#cellxgene_census.experimental.ml.encoders.Encoder) class and pass it to the `encoders` parameter of the `ExperimentDataPipe`.

Note that this link will be broken until we release.

pablo-gar added 3 commits June 26, 2024 17:26

Add pytorch article

148cb86

update article

af077a1

add article

6325439

pablo-gar requested a review from ebezzi June 28, 2024 20:16

add author

a87b65b

ebezzi reviewed Jun 28, 2024

View reviewed changes

docs/articles/2024/20240702-pytorch.md Outdated Show resolved Hide resolved

pablo-gar requested a review from prathapsridharan June 28, 2024 20:47

pablo-gar and others added 2 commits June 28, 2024 13:48

Editorial

cadef7a

Co-authored-by: Emanuele Bezzi <[email protected]>

lint

e7d398c

ebezzi reviewed Jun 28, 2024

View reviewed changes

pablo-gar and others added 3 commits June 28, 2024 14:13

editorial

cbacf9e

Co-authored-by: Emanuele Bezzi <[email protected]>

editorial

5c7e5bf

lint

a33479a

ebezzi reviewed Jul 3, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs] Add PyTorch loaders article release #1214

[docs] Add PyTorch loaders article release #1214

pablo-gar commented Jun 28, 2024 •

edited

Loading

ebezzi Jun 28, 2024

codecov bot commented Jun 28, 2024

ebezzi Jul 3, 2024

ebezzi Jul 3, 2024

ebezzi Jul 3, 2024


		We have made improvements to the loaders to reduce the amount of data transformations required from data fetching to model training. One such important change is to encode the expression data as a dense matrix immediately after the data is retrieved from disk/cloud.

		In our benchmarks, we found that densifying data increases training speed ~3X while maintaining relatively constant memory usage (Figure 3). However we still allow users to decide whether to process the expression data in sparse or dense format via the #TODO ask ebezzi to include name of parameter.

	Figure 4. Trial scVI training run with default parameters of the Census Pytorch loaders, highlighting increased speed of dense vs sparse data processing. Training was done on X mouse cells for 1 epoch in X EC2 instance #TODO ask ebezzi for further details.
	Figure 4. Trial scVI training run with default parameters of the Census Pytorch loaders, highlighting increased speed of dense vs sparse data processing. Training was done on 5684805 mouse cells for 1 epoch on a g4dn.16xlarge EC2 machine.


		For maximum flexibility, users can provide custom encoders for the cell metadata enabling custom transformations or interactions between different metadata variables.

		To use custom encoders you need to instantiate the desired encoder via the `Encoder` (#TODO ebezzi to insert a link to the docs) class and pass it to the `encoders` parameter of the `ExperimentDataPipe`.

[docs] Add PyTorch loaders article release #1214

Are you sure you want to change the base?

[docs] Add PyTorch loaders article release #1214

Conversation

pablo-gar commented Jun 28, 2024 • edited Loading

ebezzi Jun 28, 2024

Choose a reason for hiding this comment

codecov bot commented Jun 28, 2024

Codecov Report

ebezzi Jul 3, 2024

Choose a reason for hiding this comment

ebezzi Jul 3, 2024

Choose a reason for hiding this comment

ebezzi Jul 3, 2024

Choose a reason for hiding this comment

pablo-gar commented Jun 28, 2024 •

edited

Loading