-
Notifications
You must be signed in to change notification settings - Fork 31
Improve existing mmap file handling #303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves existing memory-mapped (mmap) file handling by relaxing validation checks when runs are resumed or overwritten. The changes allow reuse of existing mmap files in these scenarios, assuming the underlying data hasn't changed.
- Adds
resume_interrupted
andoverwrite
parameters to mmap-related functions to enable conditional file reuse - Updates error message formatting with proper f-string usage
- Modifies all test calls to include the new boolean parameters
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
src/lightly_train/_commands/common_helpers.py | Core logic changes to add resume/overwrite parameters and relaxed validation |
src/lightly_train/_commands/train.py | Updated function calls to pass config flags for resume and overwrite |
src/lightly_train/_commands/embed.py | Updated function calls with new parameters, hardcoded resume to False |
tests/_commands/test_common_helpers.py | Updated all test function calls to include the new boolean parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved!
What has changed and why?
This should "fix" most occurrences where we raise an error because a mmap file already exists and we don't know whether it is the correct one. It makes the assumption that
resume_interrupted
andoverwrite
are used without changing the underlying data. Forresume_interrupted
this makes always sense because resume doesn't work properly if you change the data anyways. Foroverwrite
this makes sense if we assume that overwrite is mostly used during debugging. Proper runs should always use a new directory.How has it been tested?
Did you update CHANGELOG.md?
Did you update the documentation?