-
Notifications
You must be signed in to change notification settings - Fork 785
Manually call autodiff cleanup #4075
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
To note, this PR probably has deadlocks on it, given the github action test runs. |
|
Let me know when it is stable again and I will battle test your branch. I am running training continuously and benefit a lot for my application from graph cleanup. I will say rc4 has been better with automated graph cleanup, but it isn't perfect. It helps a whole lot to call graph cleanup after I am done running all my backwards calls. |
|
@vadixidav I still haven't investigated the deadlocks, but I often rebase it to And also to note, this PR as it is, you don't need to manually run it (you can if you want), but it automatically runs on every backward as well. Edit: @laggui Also to note, if I moved the "extra cleanup procedure" from this PR to below the current cleanup procedure (not above as it currently is), then it fails (deadlocks aside) in cleaning up. So this is why I believe that this is related to Arc counting on the standard |
|
This PR has been marked as stale because it has not been updated for over a month |
|
This linked PR (#4342) solves the need for the extra cleanup, and in current So now this PR is mostly kept as a drafty way to directly call to the cleanup. |
Codecov Report❌ Patch coverage is
❌ Your patch check has failed because the patch coverage (75.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #4075 +/- ##
==========================================
- Coverage 68.65% 68.65% -0.01%
==========================================
Files 1411 1411
Lines 168676 168671 -5
==========================================
- Hits 115804 115796 -8
- Misses 52872 52875 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Yeah that PR fixed a little oversight during the refactor that missed re-using the existing I appreciate you following up on the changes! Given this report, I think we'll eventually close this draft and you won't have to keep it up to date 😄 |
Pull Request Template
Checklist
cargo run-checkscommand has been executed.Related Issues/PRs
Changes
graph_cleanupmethod to theburn::tensor::backend::AutodiffBackendtrait.Testing
Example of usage:
Then by accompanying the system's ram usage, the memory usage doesn't grow (in case it's in the loop). Also, the are no vector/hashmaps/hashsets re-allocations, possibly improving runtime by a little.
I recently rebased the
graph-sweepbranch tomain, as I'm using it locally.In the current
main, the "hot loop stress test" (increased to 10 million iterations) eventually still has the ram going up to around ~1GB, whereasgraph-sweepleaves it limited at ~138kB. I haven't looked much into it why some nodes are missed onmain, but I suspect it could be related to Arc counting (but I'm not sure).Well, the reason why I created another PR is because all updates to the
graph-sweepbranch are ignored on the previous PR, because it is closed. And If the PR was already closed and if I rebased the branch (i.e. force-pushed the branch), I can no longer re-open it (-.-)', it appears to be a Github thing. So if it is ok, and contrary to what I did, I'd like to leave this PR open, even if it's open-ended.