-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Re-using interactive shell for multiple notebook executions #97
Comments
hi, we haven't explored this. are you running the notebook iterations one at a time? if you're and you're seeing memory usage increasing that might be a bug, a user reported something similar and fixed it (#75), but there might be other leaks yet. a quick way to fix this is to run |
Thanks for the quick reply Eduardo. Our specific use case is that we are using jupyter notebooks for our application testing and have wrapped ploomber engine in a test harness. So we are running ~3000 separate notebooks one after the other from a single process. We see the memory steadily climb, and the thing that actually brought the issue to our attention was the 3-4 minute lag from when the application is done running all the tests to when it exits. During this period, we see the process memory drop so there is a significant time cost to the python memory cleanup, above and beyond the memory utilization itself.
We don’t want to run in a sub-process because that slows us down. We need these tests to be as fast as possible (this is what brought us to ploomber in the first place).
We will continue to investigate and let you know what we find.
Thanks again. And thanks for making ploomber engine.
Jim
… On Apr 12, 2024, at 9:32 PM, Eduardo Blancas ***@***.***> wrote:
hi, we haven't explore this.
are you running the notebook iterations one at a time? if you're and you're seeing memory usage increasing that might be a bug, a user reported something similar and fixed it (#75 <#75>), but there might be other leaks yet.
a quick way to fix this is to run ploomber-engine via the subprocess module, this way you'll ensure that each call completely wipes out memory
—
Reply to this email directly, view it on GitHub <#97 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACN2P773GBW3RR7QLNZSLVLY5CDLHAVCNFSM6AAAAABGEX6HQKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJSHA4TEMRZGE>.
You are receiving this because you authored the thread.
|
yeah, I guess that there is some memory leak somewhere. another thing you can do to speed things up is turn off our anonymous telemetry:
I'm unsure this will have a big effect but let me know if it helps (we're thinking of removing it completely) |
We have a scenario where we are running the same notebook thousands of times. And we are seeing memory increasing significantly as we progress. Initial investigation looks like incremental memory is primarily because we import pandas and numpy in the notebooks. We were thinking if we could import pandas and numpy in the client._shell, then re-use that shell, we might be able to manage our memory. I am looking into this now, but wondered if it is something you had already explored or even already support. Thank you.
The text was updated successfully, but these errors were encountered: