-
Notifications
You must be signed in to change notification settings - Fork 229
Prior check if migration needed on SqliteZipBackend initialisation #6963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6963 +/- ##
==========================================
+ Coverage 79.05% 79.06% +0.02%
==========================================
Files 566 566
Lines 43675 43696 +21
==========================================
+ Hits 34522 34546 +24
+ Misses 9153 9150 -3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
@@ -528,6 +528,7 @@ def create_profile( | |||
|
|||
LOGGER.report('Initialising the storage backend.') | |||
try: | |||
# PRCOMMENT: Not sure what this context manager is for? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This context manager changes sys.stdout
temporary (within the context), so this it is doing sys.stdout = mystdout
. Since it does not store the redirected information anywhere, I think person just wanted to prevent whatever profile.storage_cls.initialise(profile)
is outputting is printed to the user terminal. Outside of the context you put pack sys.stdout
back to what it was before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah you already have remove it in a PR. Okay i see your question was more about why do we redirect this and not print it to the terminal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit afraid that the removal now triggers stdout at places where you actually don't want it to happen, because we use this function at places where the user would be confused about migration. Check usage of create_profile
. For example now verdi presto
prints information about initialisation. I think in the case of verdi presto
the additional prints are fine and the other usages I seem also fine but I did not thoroughly checked. Hope you checked it when merging the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're right, that's a valid concern. Though, if migration is happening, the user should always be told, I think. In the case I was looking at, archive migration as part of sqlite_zip profile migration, it did make sense to print the output. However, I didn't verify if it also captures other, unwanted output elsewhere, though. Will check this. Though, anyway, it might be better to solve this be setting the logger messages at different log levels, instead of capturing all stdout, as it was done. TBD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, with verdi presto
and verdi profile setup core.sqlite_dos
, the only additional logging that comes from the change in PR #6964 is:
Report: Migrating to the head of the main branch
Which is probably even wrong in the first place... Similar to the spirit of this PR, why is there even any migration-related code being called during creation of a fresh profile, with new databases 🤔
90f3f38
to
b66fbe6
Compare
@superstar54, maybe we can review this together during the coding days? Would you have time? |
Sure! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @GeigerJ2 , thanks for the work. Overall looks good to me. I added a few comments on the tests and logger.
# Test: force overwrite existing output | ||
output_path.write_text('existing content') | ||
migrate(input_path, output_path, latest_version, force=True) | ||
assert zipfile.is_zipfile(output_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, you should check if the output_path
is new and is different from the previous output_path
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually implicitly tested via is_zipfile
. The original file is a text file, hence it would fail the is_zipfile
test without the migration. I added an explicit check for this failure above, to make it more clear.
zf.writestr('metadata.json', json.dumps(metadata)) | ||
assert input_path.exists() | ||
|
||
# Test: different paths, should copy file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a test when the paths are identical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! This case was actually not handled and instead excepting the migrate
function, so I added an early return there, as well as a test.
|
||
# The archive exists but ``reset == False``, so we try to migrate to the latest schema version. If the | ||
# migration works, we replace the original archive with the migrated one. | ||
with tempfile.TemporaryDirectory() as dirpath: | ||
filepath_migrated = Path(dirpath) / 'migrated.zip' | ||
LOGGER.report(f'Migrating existing {cls.__name__}') | ||
migrate(filepath_archive, filepath_migrated, cls.version_head()) | ||
LOGGER.report(f'Migrating existing {cls.__name__} to {target_version}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if check_migration_needed
return current_version
LOGGER.report(f'Migrating existing {cls.__name__} to {target_version}') | |
LOGGER.report(f'Migrating existing {cls.__name__} from version {current_version} to {target_version}') |
if not
LOGGER.report(f'Migrating existing {cls.__name__} to {target_version}') | |
LOGGER.report(f'Migrating existing {cls.__name__} to version {target_version}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
raise StorageMigrationError(msg) | ||
|
||
# check if migration is needed | ||
return current_version != target_version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can return the current_version
and use it in the log.
return current_version != target_version | |
return current_version != target_version, current_version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this, but didn't like this approach because it muddied the return of the function, and in most cases, I ended up just throwing the current_version
return away (apart from logging). This led to a general refactor now, where I split the check_migration_needed
code into two functions, get_current_archive_version
and validate_archive_versions
, both as staticmethods
of the SqliteZipBackend
. I feel like this is more readable. Happy to read your thoughts :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented that, but was not too happy with it, as this approach muddies the return of the function, and in most cases, I was just throwing away the current version return. I actually ended up refactoring the code, and split the logic of the check_migration_needed
function into get_current_archive_version
and validate_archive_versions
, both implemented as staticmethods of the SqliteZipBackend
class. Happy to hear your thoughts :)
Thanks a lot for the review, @superstar54! Will address your comments today :) |
@superstar54, OK, took a bit longer, sorry :D I addressed all your points. Could I ask you to have another look. Thanks! |
Problem
The
SqliteZipBackend.initialise()
method was always running the migration code during initialization, even when archives were already at the target version.Solution
This PR introduces the
check_migration_needed()
function that validates whether migration is actually required before attempting the migration process, with the implementation basically just being checks factored out which were previously part of themigrate
method.Now,
initialise()
performs this check early and returns immediately with appropriate logging if no migration is needed. Themigrate()
function has also been refactored to use this same validation logic.Tests were added for the
check_migration_needed
function, as well as for the two paths duringinitilialise
(migration required or not). Previously, no tests were in place for thesrc/aiida/storage/sqlite_zip/migrator.py
file, so thetest_migrator.py
is newly added.Closes #6961.