Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra slow pytorch imports (~30s) #889

Open
kpister opened this issue Nov 17, 2024 · 7 comments · May be fixed by #891
Open

Extra slow pytorch imports (~30s) #889

kpister opened this issue Nov 17, 2024 · 7 comments · May be fixed by #891
Assignees

Comments

@kpister
Copy link

kpister commented Nov 17, 2024

Describe the bug
Pytorch related imports are taking an extra long time to resolve (15x longer?) when using scalene vs python.

To Reproduce
I have a simple test.py file, which is just import torch.
Run scalene test.py
Wait ~30s for report to finish
Run python test.py
Wait ~2s

Screenshots
image

** Versions **

  • OS: debian 11
  • Python: 3.11.10
  • Scalene: 1.5.48
  • Torch: 2.5.1+cu121

I enabled gpu with scalene

@sternj sternj self-assigned this Nov 18, 2024
@emeryberger
Copy link
Member

Thanks for the report. We've been able to reproduce this locally and are looking into it.

@emeryberger
Copy link
Member

In the interim, as a work-around, you can specify --cpu --gpu (the culprit at the moment appears to be the memory /copy profiling).

@sternj
Copy link
Collaborator

sternj commented Nov 18, 2024

Likewise reproduced, also with around a 50x slowdown.

@sternj
Copy link
Collaborator

sternj commented Nov 19, 2024

On torch==2.5.1, disabling the settrace reduces the runtime from ~100s to ~4s. I'm going to see whether it was introduced in a particular Scalene commit or with a particular Pytorch version.

@sternj
Copy link
Collaborator

sternj commented Nov 19, 2024

I do not see this performance degradation at 5e457916606b1ebc. Bisecting.

@sternj
Copy link
Collaborator

sternj commented Nov 19, 2024

Problem was introduced in b9ad0a56582cf4d

@sternj
Copy link
Collaborator

sternj commented Nov 20, 2024

I've been looking into this more and the root problem has to do with how and when the interpreter calls the tracing function.

At the moment, it seems that the C logic that decides when to disable the PyEval_SetTrace callback isn't properly disabling the callback in library calls. This is incurring a function call overhead unconditionally for every single event (opcode, line, call, and return) for every part of every library executed anywhere in the program. Since importing Pytorch invokes a lot of code, the function call overhead adds up incredibly quickly.

I'm making headway on figuring out how precisely to do this disabling, CPython actually checks in several different places and manners to see whether a trace callback exists and whether to execute it. This is governed in both the PyThreadState struct and the _PyCFrame struct, with a lot of that happening in ceval.c. I think I'll have a solution by the end of tomorrow.

@sternj sternj linked a pull request Nov 24, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants