Documentation for pl.LightningModule that includes many nn.Modules #28

turian · 2020-12-15T16:31:40Z

I have a pl.LightningModule (pytorch-lightning) that includes many nn.Modules.

It's not obvious from the documentation how I can profile all the LightningModule tensors and the subordinate Module tensors. Could you please provide an example?

turian · 2020-12-15T17:00:12Z

Here is an example:

https://colab.research.google.com/github/PytorchLightning/pytorch-lightning/blob/master/notebooks/01-mnist-hello-world.ipynb

In my code (not the colab above, but a similar style), I don't OOM when I create the model. I OOM when I run

trainer.fit(model)

How do I memory profile why I OOM?

Stonesjtu · 2020-12-18T05:11:38Z

THX for reporting. I'll investigate the integration with pytorch lightning in this weekend.

But in principle, the only thing need to be done is to add the forward function into the line_profiler.

Stonesjtu · 2020-12-18T09:50:40Z

It looks like our current implementation cannot profiling the detailed memory usage inside nn.Module. However you can work this around by simply defining a dummy container Module like:

class Net(pl.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv1D(xxx)
    @profile
    def forward(self, input):
        out = self.conv1(input)
        return out

turian · 2020-12-19T01:41:30Z

@Stonesjtu if I have an nn.Module that contains other nn.Modules (which in turn contain other nn.Modules), do I add @Profile decorator to all nn.Modules to see what is happening? Thank you for the help.

Stonesjtu · 2020-12-20T12:16:06Z

A common workflow is to profile top-down. Usually 2 or 3 profile should give you an overall memory consumption statistics.

turian · 2023-04-22T15:08:48Z

@Stonesjtu wanted to ping on this issue to see if there is a better way to use memlab with lightning now.

profPlum · 2024-08-30T22:58:19Z

@turian Does the MemReporter work for you? It says it is supposed to work recursively on more complicated nn.Modules.

This was referenced Dec 19, 2020

System metrics replicate/keepsake#320

Open

Native integration of pytorch_memlab or something like it Lightning-AI/pytorch-lightning#5189

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation for pl.LightningModule that includes many nn.Modules #28

Documentation for pl.LightningModule that includes many nn.Modules #28

turian commented Dec 15, 2020

turian commented Dec 15, 2020

Stonesjtu commented Dec 18, 2020

Stonesjtu commented Dec 18, 2020

turian commented Dec 19, 2020

Stonesjtu commented Dec 20, 2020

turian commented Apr 22, 2023

profPlum commented Aug 30, 2024

Documentation for pl.LightningModule that includes many nn.Modules #28

Documentation for pl.LightningModule that includes many nn.Modules #28

Comments

turian commented Dec 15, 2020

turian commented Dec 15, 2020

Stonesjtu commented Dec 18, 2020

Stonesjtu commented Dec 18, 2020

turian commented Dec 19, 2020

Stonesjtu commented Dec 20, 2020

turian commented Apr 22, 2023

profPlum commented Aug 30, 2024