Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize LocalZipStorageHandler to avoid reopening ZIP file for each WARC record lookup #25

Open
4 tasks
leewesleyv opened this issue Dec 31, 2024 · 0 comments
Labels
cleanup/optimisation Refactoring or other code improvements

Comments

@leewesleyv
Copy link
Collaborator

Currently, the LocalZipStorageHandler reopens the ZIP file for each WARC record lookup and when fetching the index. This approach is inefficient and should therefore be optimized.

Proposed Changes

Use a context manager or an initialization process to open the ZIP file once and keep it open for subsequent operations.
Ensure the file is properly closed when the LocalZipStorageHandler instance is no longer needed (e.g., implement __enter__ and __exit__ methods).

Tasks

  • Modify LocalZipStorageHandler to keep the ZIP file open during its lifetime.
  • Implement __enter__ and __exit__ methods to support proper resource cleanup.
  • Update existing methods to use the persistent file handle.
  • Write tests to verify that the file handle is reused and closed properly.
@leewesleyv leewesleyv added the cleanup/optimisation Refactoring or other code improvements label Dec 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup/optimisation Refactoring or other code improvements
Projects
None yet
Development

No branches or pull requests

1 participant