Skip to content

Conversation

GeigerJ2
Copy link
Contributor

@GeigerJ2 GeigerJ2 commented Jul 23, 2025

Problem

The SqliteZipBackend.initialise() method was always running the migration code during initialization, even when archives were already at the target version.

Solution

This PR introduces the check_migration_needed() function that validates whether migration is actually required before attempting the migration process, with the implementation basically just being checks factored out which were previously part of the migrate method.

Now, initialise() performs this check early and returns immediately with appropriate logging if no migration is needed. The migrate() function has also been refactored to use this same validation logic.

Tests were added for the check_migration_needed function, as well as for the two paths during initilialise (migration required or not). Previously, no tests were in place for the src/aiida/storage/sqlite_zip/migrator.py file, so the test_migrator.py is newly added.

Closes #6961.

Copy link

codecov bot commented Jul 23, 2025

Codecov Report

❌ Patch coverage is 97.56098% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 79.06%. Comparing base (313f342) to head (e6e611d).

Files with missing lines Patch % Lines
src/aiida/storage/sqlite_zip/backend.py 96.78% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6963      +/-   ##
==========================================
+ Coverage   79.05%   79.06%   +0.02%     
==========================================
  Files         566      566              
  Lines       43675    43696      +21     
==========================================
+ Hits        34522    34546      +24     
+ Misses       9153     9150       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@@ -528,6 +528,7 @@ def create_profile(

LOGGER.report('Initialising the storage backend.')
try:
# PRCOMMENT: Not sure what this context manager is for?
Copy link
Contributor

@agoscinski agoscinski Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This context manager changes sys.stdout temporary (within the context), so this it is doing sys.stdout = mystdout. Since it does not store the redirected information anywhere, I think person just wanted to prevent whatever profile.storage_cls.initialise(profile) is outputting is printed to the user terminal. Outside of the context you put pack sys.stdout back to what it was before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah you already have remove it in a PR. Okay i see your question was more about why do we redirect this and not print it to the terminal

Copy link
Contributor

@agoscinski agoscinski Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit afraid that the removal now triggers stdout at places where you actually don't want it to happen, because we use this function at places where the user would be confused about migration. Check usage of create_profile. For example now verdi presto prints information about initialisation. I think in the case of verdi presto the additional prints are fine and the other usages I seem also fine but I did not thoroughly checked. Hope you checked it when merging the PR.

Copy link
Contributor Author

@GeigerJ2 GeigerJ2 Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right, that's a valid concern. Though, if migration is happening, the user should always be told, I think. In the case I was looking at, archive migration as part of sqlite_zip profile migration, it did make sense to print the output. However, I didn't verify if it also captures other, unwanted output elsewhere, though. Will check this. Though, anyway, it might be better to solve this be setting the logger messages at different log levels, instead of capturing all stdout, as it was done. TBD

Copy link
Contributor Author

@GeigerJ2 GeigerJ2 Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, with verdi presto and verdi profile setup core.sqlite_dos, the only additional logging that comes from the change in PR #6964 is:

Report: Migrating to the head of the main branch

Which is probably even wrong in the first place... Similar to the spirit of this PR, why is there even any migration-related code being called during creation of a fresh profile, with new databases 🤔

@GeigerJ2 GeigerJ2 marked this pull request as ready for review August 12, 2025 12:51
@GeigerJ2 GeigerJ2 force-pushed the sqlite-check-migration-needed branch from 90f3f38 to b66fbe6 Compare August 13, 2025 07:20
@GeigerJ2 GeigerJ2 requested review from unkcpz and khsrali and removed request for unkcpz and khsrali August 13, 2025 10:49
@GeigerJ2 GeigerJ2 self-assigned this Aug 19, 2025
@GeigerJ2
Copy link
Contributor Author

@superstar54, maybe we can review this together during the coding days? Would you have time?

@superstar54
Copy link
Member

@superstar54, maybe we can review this together during the coding days? Would you have time?

Sure!

@GeigerJ2 GeigerJ2 assigned superstar54 and unassigned GeigerJ2 Aug 21, 2025
Copy link
Member

@superstar54 superstar54 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @GeigerJ2 , thanks for the work. Overall looks good to me. I added a few comments on the tests and logger.

# Test: force overwrite existing output
output_path.write_text('existing content')
migrate(input_path, output_path, latest_version, force=True)
assert zipfile.is_zipfile(output_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, you should check if the output_path is new and is different from the previous output_path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually implicitly tested via is_zipfile. The original file is a text file, hence it would fail the is_zipfile test without the migration. I added an explicit check for this failure above, to make it more clear.

zf.writestr('metadata.json', json.dumps(metadata))
assert input_path.exists()

# Test: different paths, should copy file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a test when the paths are identical.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! This case was actually not handled and instead excepting the migrate function, so I added an early return there, as well as a test.


# The archive exists but ``reset == False``, so we try to migrate to the latest schema version. If the
# migration works, we replace the original archive with the migrated one.
with tempfile.TemporaryDirectory() as dirpath:
filepath_migrated = Path(dirpath) / 'migrated.zip'
LOGGER.report(f'Migrating existing {cls.__name__}')
migrate(filepath_archive, filepath_migrated, cls.version_head())
LOGGER.report(f'Migrating existing {cls.__name__} to {target_version}')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if check_migration_needed return current_version

Suggested change
LOGGER.report(f'Migrating existing {cls.__name__} to {target_version}')
LOGGER.report(f'Migrating existing {cls.__name__} from version {current_version} to {target_version}')

if not

Suggested change
LOGGER.report(f'Migrating existing {cls.__name__} to {target_version}')
LOGGER.report(f'Migrating existing {cls.__name__} to version {target_version}')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

raise StorageMigrationError(msg)

# check if migration is needed
return current_version != target_version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can return the current_version and use it in the log.

Suggested change
return current_version != target_version
return current_version != target_version, current_version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this, but didn't like this approach because it muddied the return of the function, and in most cases, I ended up just throwing the current_version return away (apart from logging). This led to a general refactor now, where I split the check_migration_needed code into two functions, get_current_archive_version and validate_archive_versions, both as staticmethods of the SqliteZipBackend. I feel like this is more readable. Happy to read your thoughts :)

Copy link
Contributor Author

@GeigerJ2 GeigerJ2 Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented that, but was not too happy with it, as this approach muddies the return of the function, and in most cases, I was just throwing away the current version return. I actually ended up refactoring the code, and split the logic of the check_migration_needed function into get_current_archive_version and validate_archive_versions, both implemented as staticmethods of the SqliteZipBackend class. Happy to hear your thoughts :)

@GeigerJ2
Copy link
Contributor Author

Thanks a lot for the review, @superstar54! Will address your comments today :)

@GeigerJ2
Copy link
Contributor Author

@superstar54, OK, took a bit longer, sorry :D I addressed all your points. Could I ask you to have another look. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid always triggering migration code on SqliteZip initialisation
3 participants