-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX: Do not evaluate callable defaults during tab-completion #6144
Conversation
@@ -158,7 +158,7 @@ def get_default(self, ctx: click.Context, call: bool = True) -> t.Optional[t.Uni | |||
if self._contextual_default is not None: | |||
default = self._contextual_default(ctx) | |||
else: | |||
default = super().get_default(ctx) | |||
default = super().get_default(ctx, call=call) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated fix
@sphuber this is ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @danielhollas . As for the test, maybe you could try something like the following:
# -*- coding: utf-8 -*-
###########################################################################
# Copyright (c), The AiiDA team. All rights reserved. #
# This file is part of the AiiDA code. #
# #
# The code is hosted on GitHub at https://github.com/aiidateam/aiida-core #
# For further information on the license, see the LICENSE.txt file #
# For further information please visit http://www.aiida.net #
###########################################################################
# pylint: disable=redefined-outer-name
"""Tests for the :mod:`aiida.cmdline.params.options.callable` module."""
import pytest
from click.shell_completion import ShellComplete
from aiida.cmdline.commands.cmd_verdi import verdi
def _get_completions(cli, args, incomplete):
comp = ShellComplete(cli, {}, cli.name, '_CLICK_COMPLETE')
return comp.get_completions(args, incomplete)
@pytest.fixture
def unload_config():
"""Temporarily unload the config by setting ``aiida.manage.configuration.CONFIG`` to ``None``."""
from aiida.manage import configuration
config = configuration.CONFIG
try:
configuration.CONFIG = None
yield
finally:
configuration.CONFIG = config
@pytest.mark.usefixtures('unload_config')
def test_callable_default_resilient_parsing():
"""Test that tab-completion of ``verdi`` does not evaluate defaults that load the config, which is expensive."""
from aiida.manage import configuration
assert configuration.CONFIG is None
[c.value for c in _get_completions(verdi, [], '')]
assert configuration.CONFIG is None
This fails for the main
branch as it should. If it passes on your branch, I would say this provides some assurance it is working as intended
@@ -145,6 +146,7 @@ def set_log_level(_ctx, _param, value): | |||
'profile', | |||
type=types.ProfileParamType(), | |||
default=defaults.get_default_profile, | |||
cls=CallableDefaultOption, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really the only option that has an expensive callable default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the only place where the config/profile is loaded, all the other use the InteractiveOption where this is already handled. But you are right that there likely are other expensive defaults, but I plan to look into this in a followup PR where I will also look at the timings more closely.
631cf97
to
e265067
Compare
Seems like the test is doing its job 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, indeed, thank you very much for the test! This is now ready from my side.
from aiida.manage import configuration | ||
|
||
config = configuration.CONFIG | ||
configuration.CONFIG = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: I have removed the try-except block, I don't think it is necessary, pytest
should ensure that the fixture is run to completion after the test, unless the fixture itself excepts before the yield point, but here we only have two assignments.
https://docs.pytest.org/en/latest/how-to/fixtures.html#teardown-cleanup-aka-fixture-finalization
https://docs.pytest.org/en/latest/how-to/fixtures.html#safe-teardowns
|
||
assert configuration.CONFIG is None | ||
completions = [c.value for c in _get_completions(verdi, [], '')] | ||
assert 'help' in completions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that we do not test autocompletion anywhere else in the test suite. I'll try adding more tests in a separate PR, for now I added at least this simple assert.
(also to shutup pylint which was complaining about unassigned expression)
if not _ctx.resilient_parsing: | ||
configure_logging() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit surprised by this change. This function set_log_level
is only assigned as the callback of the VERBOSITY
option. I don't think this is supposed to be called during tab-completion anyway. I just tested this and it indeed doesn't seem to be called during tab-completion. Was this the part of the code that caused the new test to fail? Do you understand why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, sorry for not being clear, I am also confused, but you can try when you remove it the test fails. But when I test the completion on the actual command line it is not called. Maybe the click function used in the test is not exactly the one that gets called?? Btw: I was testing on BASH, wonder if other shells may behave differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps there is some weird interaction with the test suite. Not sure if it is worth deeper investigation, since the change itself seems like an okay thing to do on its own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Figured it out. I tested whether this was being called when actually tab-completing verdi
by adding a print
statement. Since that print statement never showed up, I concluded that the function wasn't being called. But that is not true. It was actually called, but during tab-completion, all output to sys.stdout
is captured and so I didn't see anything. Printing to sys.stderr
would actually show, or simply raising an exception would confirm the function was being called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @danielhollas
Update test_execmanager.py `CalcJobNode`: Fix validation for `depth=None` in `retrieve_list` (#6078) Commit a1b9f79a97c5e987aa900c1db3258339abaa6aa3 added support for using `None` as the third element in a directive of the `retrieve_list` of a `CalcJob`. However, the method `_validate_retrieval_directive` that validates the retrieve list directives when stored on the `CalcJobNode` was not updated and would only still accept integers. update run methods CLI: Fix bug in `verdi data core.trajectory show` for various formats (#5394) These minor bugs went unnoticed because the methods are wholly untested. This is partly because they rely on additional Python modules or external executables. For the formats that rely on external executables, i.e., `jmol` and `xcrysden`, the `subprocess.check_output` function is monkeypatched to prevent the actual executable from being called. This tests all code except for the actual external executable, which at least gives coverage of our code. The test for `mpl_pos` needed to be monkeypatched as well. This is because the `_show_mpl_pos` method calls `plot_positions_xyz` which imports `matplotlib.pyplot` and for some completely unknown reason, this causes `tests/storage/psql_dos/test_backend.py::test_unload_profile` to fail. For some reason, merely importing `matplotlib` (even here directly in the test) will cause that test to claim that there still is something holding on to a reference of an sqlalchemy session that it keeps track of in the `sqlalchemy.orm.session._sessions` weak ref dictionary. Since it is impossible to figure out why the hell importing matplotlib would interact with sqlalchemy sessions, the function that does the import is simply mocked out for now. Co-authored-by: Sebastiaan Huber <[email protected]> ORM: Check nodes are from same backend in `validate_link` (#5462) Tests: Fix `StructureData` test breaking for recent `pymatgen` versions (#6088) The roundtrip test for the `StructureData` class using `pymatgen` structures as a go between started failing. The structure is constructed from a CIF file with partial occupancies. The `label` attribute of each site in the pymatgen structure, as returned by `as_dict` would look like the following, originally: ['Bi', 'Bi', 'Te:0.667, Se:0.333', 'Te:0.667, Se:0.333', 'Te:0.667, Se:0.333'] ['Bi', 'Bi', 'Te:0.667, Se:0.333', 'Te:0.667, Se:0.333', 'Te:0.667, Se:0.333'] In commit 63bbd23b57ca2c68eaca07e4915a70ef66e13405, released with v2023.7.14, the CIF parsing logic in `pymatgen` was updated to include parsing of the atom site labels and store them on the site `label` attribute. This would result in the following site labels for the structure parsed directly from the CIF and the one after roundtrip through `StructureData`: ['Bi', 'Bi', 'Se1', 'Se1', 'Se1'] [None, None, None, None, None] The roundtrip returned `None` values because in the previously mentioned commit, the newly added `label` property would return `None` instead of the species label that used to be returned before. This behavior was corrected in commit 9a98f4ce722299d545f2af01a9eaf1c37ff7bd53 and released with v2023.7.20, after which the new behavior is the following: ['Bi', 'Bi', 'Se1', 'Se1', 'Se1'] ['Bi', 'Bi', 'Te:0.667, Se:0.333', 'Te:0.667, Se:0.333', 'Te:0.667, Se:0.333'] The site labels parsed from the CIF are not maintained in the roundtrip because the `StructureData` does not store them. Therefore when the final pymatgen structure is created from it, the `label` is `None` and so defaults to the species name. Since the label information is not persisted in the `StructureData` it is not guaranteed to be maintained in the roundtrip and so it is excluded from the test. Devops: Update pre-commit requirement `flynt==1.0.1` (#6093) Docs: Fix typo in `run_codes.rst` (#6099) Improve type hinting for `aiida.orm.nodes.data.singlefile` `SinglefileData`: Add `mode` keyword to `get_content` This allows a user to retrieve the content in bytes. Currently, a user is forced to use the more elaborate form: with singlefile.open(mode='rb') as handle: content = handle.read() or go directly through the repository interface which is a bit hidden and requires to redundantly specify the filename: content = singlefile.base.repository.get_object_content( singlefile.filename, mode='rb' ) these variants can now be simplified to: content = singlefile.get_content('rb') `RemoteData`: Add the `is_cleaned` property (#6101) This is a convenience method that will return the `KEY_EXTRA_CLEANED` extra, which is set to `True` when the `clean` method is called. The `is_empty` method is also updated to use this new property and shortcut if set to `True`. This saves the method from having to open a transport connection. Docs: Add links about "entry point" and "plugin" to tutorial (#6095) The tutorial was missing an explanation of where the entry point for the workflow came from, and how users can write their own plugins and make them available via an entry point. --------- Co-authored-by: Leopold Talirz <[email protected]> Co-authored-by: Jusong Yu <[email protected]> Lazily define `_plugin_type_string` and `_query_type_string of `Node` These class attributes require a look up whether the `Node` class has a registered entry point which can have a non-negligible cost. These attributes were defined in the `AbstractNodeMeta` class, which is the metaclass of the `Node` class, which would cause this code to be executed as soon as the class was imported. Here, the `AbstractNodeMeta` metaclass is removed. The `_plugin_type_string` and `_query_type_string` class attributes are changed to class properties. The actual value is stored in the private attribute analog which is defined lazily the first time the property is accessed. Lazily define `__type_string` in `orm.Group` This is a follow-up on previous commit aiming to speedup the import of the `aiida.orm` by avoiding costly entry point lookups. Here we completely remove the `GroupMeta` metaclass and move its logic into the `_typestring` classproperty, which avoids the code being executed on import while being backwards compatible. Do not import `aiida.cmdline` in `aiida.orm` Remove `with_dbenv` use in `aiida.orm` This forces the import of `aiida.cmdline` in `aiida.orm` which doesn't just slow down, but also is conceptually wrong. The problem of the `with_dbenv` decorator is also that it cannot be imported inside a method to avoid the import cost when importing `aiida.orm` but has to be imported at the top in order to be used. Docs: Improvements to sections containing recently added functionality (#6090) * Daemon API * Processes API * Multiple profile serving for REST API * Controlling MPI when creating a `Code` Devops: Update `pyproject.toml` configuration (#6085) Added stricter rules for `mypy` and `pytest`. Suggestions taken after automated analysis by the following tool: https://learn.scientific-python.org/development/guides/repo-review/ Caching: Try to import an identifier if it is a class path Raise a `ValueError` if the identifier cannot be imported, which will help prevent accidental typos from appearing that the caching configuration is being ignored. Caching: Add the `strict` argument configuration validation So far, the caching configuration validation only considered whether the defined identifiers were valid syntactically. This made it possible for a user to specify a valid identifier but that didn't actually match a class that can be imported or an entry point that cannot be loaded. If this is due to a typo, the user may be confused why the caching config seems to be ignored. The caching control functionality adds the `strict` argument, which when set to `True`, besides checking the syntax validity of an identifier, will also try to import/load it and raise a `ValueError` if it fails. By default it is set to `False` to maintain backwards compatibility. CLI: Remove loading backend for `verdi plugin list` This command doesn't need the storage backend and loading it adds significant unnecessary run time. CLI: Add missing entry point groups for `verdi plugin list` The following groups were not part of the entry point group mapping: * `aiida.calculations.monitors` * `aiida.calculations.importers` This made that they were not available as subcommands to `verdi plugin list`. Refactor: Delay import of heavy packages to speed up import time (#6106) The importing of packages `disk_objectstore`, `jsonschema`, `requests`, `plumpy` and `paramiko` are moved from top-level to inside the scopes where they are needed. This significantly improves the load time of the `aiida` package and its subpackages. The `ProcessLauncher` utility had to be removed from resources that are exposed at a higher package level because it required the import of `plumpy` which has a non-negligible import time. This is a breaking change but it is not expected to be used outside of `aiida-core`. Tests: Fix flaky work chain tests using `recwarn` fixture (#6112) The tests were often failing because the `recwarn` fixture contained two records instead of one. The reason is that elsewhere in the code a `ResourceWarning` is emitted because an event-loop is not closed when another one is created. Until this is fixed, the assertion is updated to not check for the number of warnings emitted, but specifically to check the expected warning message is present. ORM: Add the `Entity.get_collection` classmethod This is an alternative for the `collection` class property. Where the `collection` property uses the current default storage backend, the `get_collection` allows to specify another specific backend. The original design intended to support this use-case by allowing a `Collection` instance to be "called" with a specific backend: Entity.collection(backend) The `.collection` returns a `Collection` instance which is then called with passing the `backend`, which would return a new `Collection` for the same entity type, but with the other backend. However, this doesn't always work, because the `collection` property will load the backend from the default profile, which will except if not loaded. Although this scenario is unlikely for normal usage, application developers may use AiiDA's API with multiple active backends where no default profile is defined. The reason for a new method instead of changing the `collection` property is that that would be backwards incompatible. ORM: Replace `.collection(backend)` with `.get_collection(backend)` The classproperty will try to load the default storage backend first, before recreating the collection with the specified backend. Not only is this inefficient as the collection is recreated if the `backend` is not the current default one, but it can also fail in situations where there is no default profile is available and the caller wants to directly specify the backend. ORM: Explicitly pass backend when constructing new entity Whenever an ORM entity instance instantiates another entry, it should explicitly pass its own backend as the storage backend to use. Similarly, functions that accept a storage backend as an argument, should consistently pass whenever instantiating a new entity or its collection. Co-Authored-By: Riccardo Bertossa <[email protected]> Docs: Add links to Discourse server (#6111) The Discourse server replaces the mailing list and Slack channel. The README and documentation is updated accordingly. Add type hinting for `aiida.orm.nodes.data.array.array` `ArrayData`: Allow defining array(s) on construction Currently, the constructor does not allow to define any arrays to set when constructing a new node, so one is forced to multi line code: node = ArrayData() node.set_array('a', np.array([1, 2])) node.set_array('b', np.array([3, 4])) This commit allows initialization upon construction simplifying the code above to: node = ArrayData({'a': np.array([1, 2]), 'b': np.array([3, 4])}) Note that it is also possible to pass a single array to the constructor, in which case the array name is taken from the `default_array_name` class attribute. For backwards compatibility, it remains possible to construct an `ArrayData` without any arrays. `ArrayData`: Make `name` optional in `get_array` The `ArrayData` was designed to be able to store multiple numpy arrays. While useful, it forced users to be more verbose than necessary when only storing a single array as an explicit array name is always required: node = ArrayData() node.set_array('some_key', numpy.array([])) node.get_array('some_key') The `get_array` method is updated to allow `None` for the `name` argument as long as the node only stores a single array so that it can return the correct array unambiguously. This simplifies typical user code significantly: node = ArrayData(numpy.array([])) node.get_array() Add type hinting for `aiida.orm.nodes.data.array.xy` `XyData`: Allow defining array(s) on construction Currently, the constructor does not allow to define any arrays to set when constructing a new node, so one is forced to multi line code: node = XyData() node.set_x(np.array([1, 2]), 'name', unit') node.set_y(np.array([3, 4]), 'name', unit') This commit allows initialization upon construction simplifying the code above to: node = XyData( np.array([1, 2]), np.array([3, 4]), x_name='name', x_unit='unit', y_names='name', y_units='unit' ) The units and names are intentionally made into keyword argument only in order to prevent accidental swapping of values. For backwards compatibility, it remains possible to construct an `XyData` without any arrays. Tests: Make `PsqlDosStorage` profile unload test more robust (#6115) The `test_unload_profile` test verifies that if a loaded profile is unloaded, it properly relinquishes of the session that is maintained by sqlalchemy. It did so by checking that after unloading, there were no sessions being referenced. However, this would fail sometimes, because another session may still be held on to, even though that session had nothing to do with the test. A more robust test is simply to check that after unloading, there is exactly one less session being held on to. Dependencies: Add compatibility for `pymatgen>=v2023.9.2` (#6109) As of v2023.9.2, the ``properties`` argument of the `Specie` class is removed and the ``spin`` argument should be used instead. See: https://github.com/materialsproject/pymatgen/commit/118c245d6082fe0b13e19d348fc1db9c0d512019 The ``spin`` argument was introduced in v2023.6.28. See: https://github.com/materialsproject/pymatgen/commit/9f2b3939af45d5129e0778d371d814811924aeb6 Instead of removing support for versions older than v2023.6.28 the code is updated to be able to deal with the new version where `properties` is no longer supported. `BaseRestartWorkChain`: Factor out attachment of outputs (#5983) When a work chain step returns an exit code, the work chain execution is aborted. A common use for the handlers of the `BaseRestartWorkChain` is exactly this, to stop work chain execution when a particular problem or situation is detected. The downside is that no other steps can be called by the work chain implementation, for example, the `results` step to still attach any (partial) results. Of course an implementation could copy the content of the `results` method in the handler to do so, but it would have to copy the contents in each handler that still wanted to attach the outputs, duplicating the work. Here, the actual attaching of the outputs is factored out of the `results` method to the `attach_outputs` method. This method can now easily be called inside a process handler that wants to attach outputs before returning an exit code to stop the work chain. `CalcJob`: Add support for nested targets in `remote_symlink_list` (#5974) It is now possible to specify a target in the `remote_symlink_list` that contains nested directories that do not necessarily exist. The `upload_calculation` will automatically create them before creating the symlink. Improved Docker images (#6080) The current Docker image provided with `aiida-core` depends on `aiida-prerequisites` as a base image. This image is maintained outside of the `aiida-core` repo, making it additional maintenance to keep it up to date when a new `aiida-core` version is released. In addition, the `aiida-prerequisites` image is no longer maintained because the AiiDAlab stack now depends on another base image. Finally, the `aiida-prerequisites` design had shortcomings as to how the required services, PostgreSQL and RabbitMQ, are handled. They had to be started manually and were not cleanly stopped on container shutdown. An AEP was submitted to add two Docker images to `aiida-core` that simplifies their maintenance and that improve the usability by properly and automatically handling the services. See the AEP for more details: https://aep.readthedocs.io/en/latest/009_improved_docker_images/readme.html Docs: Correct example of `verdi config unset` in troubleshooting (#6118) Devops: Upload artifact by PR from forks for docker workflow (#6119) Refactor: Delay import of heavy packages to speed up import time (#6116) The importing of packages `urllib`, `yaml` and `pgsu` are moved from top-level to inside the scopes where they are needed. This significantly improves the load time of the `aiida` package and its subpackages. The import of `Transport` in the `aiida.engine` package also slows down imports, but it is only used for type checking, so its import is placed inside the `if TYPE_CHECKING` guard. Finally, the `DEFAULT_DBINFO`, `Postgres` and `PostgresConnectionMode` objects of the `aiida.manage.external.postgres` package are no longer exposed on the top-level as this also slows down imports. This is a breaking change technically, but these resources should not be used by downstream packages. Devops: Use Python 3.10 for `pre-commit` in CI and CD workflows (#6121) Typing: Improve annotations of process functions (#6077) Docs: Update `pydata-sphinx-theme` and add Discourse links (#6120) The `pydata-sphinx-theme` is updated to `v0.13.3` or higher, which is the latest release. This changes the style a bit, but mostly for the better. It also allows to use custom icon links in the top-right header. This is used to add a link to the AiiDA home page (using the AiiDA icon) and add a link to the new Discourse server. Another admonition box is added to the landing page that directs users needing support to Discourse. `InteractiveOption`: Fix validation being skipped if `!` provided The `InteractiveOption` reserves the exclamation point `!` as a special character in order to "skip" the option and not define it. This is necessary for options that are _not_ required but that do specify a defualt. Without this character, it is impossible to _not_ set the default as, if the user doesn't specify anything specific and simply presses enter, the default is taken, even if the user did not want to specify any specific value. The problem with the implementation, however, is that when `!` was provided, the option would return `None` as the value, bypassing any validation that might be defined. This would make it possible to bypass the validation of a required option. The solution is to, when `!` is provided for an interactive option, it is translated to `None` and is then processed as normal, validating it as any other value. CLI: Usability improvements for interactive `verdi setup` There were a number of ways that a user could break the command by providing incorrect input that was not caught by validation: * The following options are now required and can no longer incorrectly be skipped with `!`: `user_email`, `user_first_name`, `user_last_name` `user_institution`, `db_engine`, `db_backend`, `db_host`, `db_port` and the `repository_path`. * For a missing parameter in interactive mode the error now reads: Error: Institution has to be specified Instead of: Error: Missing parameter: institution which should be more intuitive * The message that a profile has successfully been created is now only displayed if the storage backend initialized successfully. Before, this was shown before storage initialization, which could then still fail, making the success message confusing. Devops: Replace outdated link in issue template (#6123) It was still pointing to the legacy Google mailing list. The link is updated to point to the Discourse server. `SqliteTempBackend`: Add support for reading from and writing to archives (#5658) To this end, the `bulk_insert` and `bulk_update` are implemented. The archive creation and import functionality currently requires that the repository of the storage backend uses a SHA256 hash for the keys of the objects. This is not the case for the `SandboxRepositoryBackend` that the `SqliteTempBackend` uses. Therefore, the `SandboxRepositoryBackend` is subclassed to `SandboxShaRepositoryBackend` which replaces the UUID key of its parent and uses a SHA256 instead. Dependencies: Add new extra `tui` that provides `verdi` as a TUI (#6071) The `tui` extra installs `trogon`. This package leverages `textual` to turn `verdi`'s `click` interface into a Text-based User Interface (TUI). It is added only if `trogon` is installed and can be imported. When it is installed, it adds the `verdi tui` command, which launches the text-based interface of `verdi`. Co-authored-by: Sebastiaan Huber <[email protected]> Docs: Add important note on using `iterall` and `iterdict` (#6126) Using the `all` and `dict` equivalents are very inefficient for large query results and will lead to performance problems. CLI: Fix `repository` being required for `verdi quicksetup` (#6129) Regression added by c53ea20a497f66bc88f68d0603cf9a32614fc4c2 which made the `--repository` option for `verdi setup` required, as it should be. However, it did so by making the base option required. The problem is that the option for both `verdi setup` as well as `verdi quicksetup` inherit from this, but for `verdi quicksetup` it should not be required as the default will be populated automatically. As an alternative, the option specific for `verdi setup` is now made required. Devops: Follow-up docker build runner macOS-ARM64 (#6127) The buildjet arm64 runner has only three-month trials, after that we need to pay to use it. The self-hosted runner is deployed on the macOS-arm64 machine located in PSI. `PsqlDosBackend`: Fix `Node.store` excepting when inside a transaction (#6125) Calling `Node.store` with the `PsqlDosBackend` would except whenever inside a transaction, for example, when iterating over a `QueryBuilder` result, which opens a transaction. The reason is that the node implementation of the `PsqlDosBackend`, the `SqlaNode.store` method calls `commit` on the session. This closes the current transaction, and so when it is then used again, for example in the next iteration of the builder results, an exception is raised by sqlalchemy complaining that the transaction was closed. The solution is that `SqlaNode.store` should only commit if it is not inside a nested transaction, otherwise it should simply flush the addition of the node to the session such that automatically generated primary keys are populated. A similar problem was addressed in the `add_nodes` and `remove_nodes` methods of the `SqlaGroup` class. These would also call `commit` at the end, regardless of whether they are called within an open transaction. Devops: Loosen trigger conditions for Docker build CI workflow (#6131) The docker build workflow was only activated when changes were made to either the `.docker` directory or `.github/workflows/docker*.yaml` files. However, changes in the `aiida` package could also break the build and so could pass by unnoticed. The trigger conditions are changed to instead trigger always except for changes to the `tests` and `docs` directories. Refactor: Replace `all` with `iterall` where beneficial (#6130) Whenever a `QueryBuilder` result is used in a loop and the total result is not fully stored in memory in some way, it is beneficial to use `iterall` since that prevents loading everything in memory for no reason. 📚 `README.md`: Add Discourse shield to header table (#6138) Docs: Changes are reverted if exception during `iterall` (#6128) An explicit test is added to guarantee that changes made while looping over the result of `iterall` or `iterdict` are reverted if an exception is raised and not caught before the end of the iterator. A note is added to the how-to section of the `QueryBuilder`. Devops: Update the `.devcontainer` to use the new docker stack (#6139) DevOps: amendment use aiida-core-base image from ghcr.io (#6141) Amendment to #6139, for unknown reason, docker pull is failed for docker.io on this repository. Using the docker registry ghcr.io works fine. `PsqlDosBackend`: Fix changes not persisted after `iterall` and `iterdict` (#6134) The `iterall` and `iterdict` generators of the `QueryBuilder` implementation for the `PsqlDosBackend` would open a transaction in order for the `ModelWrapper` to not automatically commit the session upon any mutation as that would invalidate the cursor. However, it did not manually commit the session at the end of the iterator, causing any mutations to be lost when the storage backend was reloaded. This problem was not just present in the `iterall` and `iterdict` methods of the `QueryBuilder` but rather the `transaction` method of the `PsqlDosBackend` never commits the savepoint that is returned by the `Session.begin_nested` call. Now the `transaction` explicitly commits the savepoint after the yield and the `QueryBuilder` methods are updated to simply use the `transaction` method of the storage backend, rather than going directly to the session. This change also required a change in the `SqliteZipBackend`, since the `transaction` is now called during archive creation and import, but the backend raised a `NotImplementedError`. This is because it used to be a read-only backend, however, this limitation was recently lifted. The commit simply forgot to implement the `transaction` method. Performance: Cache the lookup of entry points (#6124) Entry points are looked up using the `entry_points` callable of the `importlib_metadata` module. It is wrapped by the `eps` function in the `aiida.plugins.entry_point` module. This call, and the `.select()` filter that is used on it to find a specific entry point can be quite expensive as it involves a double loop in the `importlib_metadata` code. Since it is used throughout the `aiida-core` source code whenever an entry point is looked up, this causes a significant slowdown of module imports. The `eps` function now pre-sorts the entry points based on the group. This guarantees that the entry points of groups starting with `aiida.` come first in the lookup, giving a small performance boost. The result is then cached so the sorting is performed just once, which takes on the order of ~30 µs. The most expensive part is still the looping over all entry points when `eps().select()` is called. To alleviate this, the `eps_select` function is added which simply calls through to `eps().select()`, but which allows the calls to be cached. In order to implement the changes, the `importlib_metadata` package, which provides a backport implementation of the `importlib.metadata` module of the standard lib, was updated to v6.0. Docs: Update the image name for docker image (#6143) It was still pointing to the old name instead of the new `aiida-core-with-services`. CLI: Make loading of config lazy for improved responsiveness (#6140) The `VerdiContext` class, which provides the custom context of the `verdi` commands, loads the configuration. This has a non-negligible cost and so slows down the responsiveness of the CLI. This is especially noticeable during tab-completion. The `obj` custom object of the `VerdiContext` is replaced with a subclass of `AttributeDict` that lazily populates the `config` key when it is called with the loaded `Config` class. In addition, the defaults of some options of the `verdi setup` command, which load a value from the config and so require the config, are turned into partials such that they also are lazily evaluated. These changes should give a reduction in load time of `verdi` of the order of ~50 ms. A test of `verdi setup` had to be updated to explicitly provide a value for the email. This is because now the default is evaluated lazily, i.e. when the command is actually called in the test. At this point, there is no value for this config option and so the default is empty. Before, the default would be evaluated as soon as `aiida.cmdline.commands.cmd_setup` was imported, at which point an existing config would still contain these values, binding them to the default, even if the config would be reset afterwards before the test. Deprecation: `aiida.orm.nodes.data.upf` and `verdi data core.upf` (#6114) The `UpfData` data plugin and related utilities have been replaced by the versions maintained in the `aiida-pseudo` plugin. The latter has now been significantly adopted by most users and plugin in the ecosystem, so the outdated original version in `aiida-core` can be deprecated and removed. Tests: Print stack trace if CLI command excepts with `run_cli_command` Before, the test would just fail and say that an exception was raised but it would not display the actual exception, making it difficult to debug the problem. In the case of a non-zero exit code, the stderr is printed as well. Config: Remove use of `NO_DEFAULT` for `Option.default` The `Option.default` property would return the global constant `NO_DEFAULT` in case the option does not specify a default. The idea was that it could be used to distinguish between an option not defining a default and one defining the default to be `None`. The problem is that in the various methods that return a config option value, such as `Config.get_option` could also return this value. This would be problematic for certain CLI command options that used the `aiida.manage.configuration.get_config_option` function to set the default. If the config option was not defined, the function would return `()` which is the value of the `NO_DEFAULT` constant. When the option accepts string values, this value would often even silently be accepted although it almost certainly is not what the user intended. This would actually happen for the tests of `verdi setup`, which has the options `--email`, `--first-name`, `--last-name` and `--institution` that all define a default through `get_config_option` and therefore the default would be actually set to `()` in case the config did not specify these global config options. Since for config options there is no current use-case for actually setting a default to `None`, there is no need to distinguish between this case and a default never having been defined, and so the `NO_DEFAULT` global constant is removed and replaced by `None`. Tests: Fix failing `tests/cmdline/commands/test_setup.py` The previous commit fixed a bug in the evaluation of defaults for various options of the `verdi setup` command. Due to the bug, these options would set a default even if the corresponding config option was not defined. Instead of no default being defined, the empty tuple `()` would be set as string value. As soon as the bug was fixed, the `test_setup_profile_uuid` test started failing since it doesn't explicitly defined values for the options `--email`, `--first-name`, `--last-name` and `--institution`. ORM: `Sealable.seal()` return `self` instead of `None` The `Sealable` mixin is used by the `ProcessNode` which allows it to be sealed. By having the `seal` method return `self`, which will be the `ProcessNode` instance, it brings the behavior on par with the `store` method. ORM: `ProcessNode.is_valid_cache` is `False` for unsealed nodes When a `ProcessNode` is not yet sealed, it has not officially been terminated. At this point it cannot yet be considered a valid cache source. However, this condition was not considered by the property `ProcessNode.is_valid_cache`. This bug manifested itself in very rare situations where a race condition could lead to a process being cached from an unsealed node. When a node is stored from the cache, it copies all the attribute except for the sealed key and adds the outputs. The sealing is then left to the engine which will complete the cached process as if it had run normally. The problem arises when the cache source was not yet sealed, and so the outputs had not yet been added. The cached node will then miss the output nodes. CLI: Do not load config in defaults and callbacks during tab-completion (#6144) The `get_default_profile` default of the `PROFILE` option and the `set_log_level` callback of the `VERBOSITY` option both load the config. Since defaults and callbacks are also evaluated during tab-completion this was slowing down tab-completion significantly since loading the config has a non-negligible cost. The `set_log_level` callback is modified to explicitly check whether we are are tab-completing, in which case `ctx.resilient_parsing` is set to `True`. In this case, the functions now returns `None` and no longer loads the config. For `get_default_profile`, the `CallableDefaultOption` class is added which allows the default to be made a callable, which will return `None` if `ctx.resilient_parsing` is set to `True`. `FolderData`: Expose repository API on top-level namespace (#6150) In 8293e453789d0bad9cf631ecfc08542dd9ad892d, the `Node` interface was refactored to move the API of the repository to the `base.repository` namespace. The original methods would be forwarded with a deprecation message being printed. Although this made sense for most node types, in an effort to clean up the node interface which was overpopulated, for the `FolderData` the repository interface is the main interface and it doesn't make sense to force the users to go all the way down to the nested `base.repository` namespace to access it. Therefore the public API of the repository is restored on the top-level namespace of the `FolderData` class. Dependencies: Update to `disk-objectstore~=1.0` (#6132) * Update `DiskObjectStoreRepositoryBackend.get_info` to use the dataclasses returned by `count_objects` and `get_total_size` directly. * Change `DiskObjectStoreRepositoryBackend.maintain` to always call `clean_storage`, even during live operation of the container. * Change `DiskObjectStoreRepositoryBackend.maintain` to now pass `CompressMode.AUTO` when `compress` is set to `True`. Tests: Refactor transport tests from `unittest` to `pytest` (#6152) Now detects all plugins using `get_entry_points` instead of manually parsing module files to detect `Transport` plugins. Uses `pytest.mark.parametrize` to run all tests for all registered transport plugins. ORM: Register `numpy.ndarray` with the `to_aiida_type` to `ArrayData` (#6149) This will allow `numpy.ndarray` to be passed to process inputs that add the `to_aiida_type` serializer and expect an `ArrayData`. The single dispatch will automatically convert the numpy array to an `ArrayData` instance. Repository: Add the `as_path` context manager (#6151) The node repository interface intentionally does not provide access to its file objects through filepaths on the file system. This is because, for efficiency reasons, the content of a repository may not actually be stored as individual files on a file system, but for example are stored in an object store. Therefore, the contents of the repository can only be retrieved as a file-like object or read as a string or list of bytes into memory. Certain use-cases require a file to be made available through a filepath. An example is when it needs to be passed to an API that only accepts a filepath, such as `numpy.loadfromtxt`. Currently, the user will have to manually copy the content of the repo's content to a temporary file on disk, and pass the temporary filepath. This results in clients having to often resport to the following snippet: import pathlib import shutil import tempfile with tempfile.TemporaryDirectory() as tmp_path: # Copy the entire content to the temporary folder dirpath = pathlib.Path(tmp_path) node.base.repository.copy_tree(dirpath) # Or copy the content of a file. Should use streaming # to avoid reading everything into memory filepath = (dirpath / 'some_file.txt') with filepath.open('rb') as target: with node.base.repository.open('rb') as source: shutil.copyfileobj(source, target) # Now use `filepath` to library call, e.g. numpy.loadtxt(filepath) This logic is now provided under the `as_path` context manager. This will make it easy to access repository content as files on the local file system. The snippet above is simplified to: with node.base.repository.as_path() as filepath: numpy.loadtxt(filepath) The method is exposed directly in the interface of the `FolderData` and `SinglfileData` data types. A warning is added to the docs explaining the inefficiency of the content having to be read and written to a temporary directory first, encouraging it only to be used when the alternative is not an option. ORM: Add `NodeCaching.CACHED_FROM_KEY` for `_aiida_cached_from` constant The `_aiida_cached_from` key is used to store the UUID, of the node from which a node was cached, into the extras. It appeared in a few places as a string literal. It is now added as the `CACHED_FROM_KEY` class variable of `NodeCaching`. CLI: Add `cached` and `cached_from` projections to `verdi process list` The `cached` projection will print a checkmark for nodes that were cached from another node. The `cached_from` projection will show the UUID of the cache source, if it exists. The `cached` projection is added to the default projections for `verdi process list`. The use of caching is becoming more prevalent, but often users can still be surprised by certain behavior when processes are taken from cache when they didn't expect it. By showing whether a process was taken from cache or not by default in the output of `verdi process list` should provide clarity since this is often the first place that users check. CLI: Lazily validate entry points in parameter types (#6153) The command line has recently already been updated to lazily load entry points in order to speed-up tab completion. Here, the validation of entry points, which is to check whether a given entry point even exists, is also delayed until the point where it is really necessary. This again to keep tab-completion responsive, since even checking whether an entry point exists has a non-negligible cost. The `IdentifierParamType` and `PluginParamType` parameter types are refactored to no longer validate entry points upon construction but lazily the first time that they are actually invoked. `Parser.parse_from_node`: Validate outputs against process spec (#6159) The `parse_from_node` clasmethod is a utility function to call the `parse` method of a `Parser` implementation for a given `CalcJobNode`. It automatically calls `parse` in the correct manner, passing the `retrieved` output node, and wrapping it in a calcfunction to optionally store the provenance. However, since the `calcfunction` by default has a dynamic output namespace and so accepts all outputs, it would not perform the same output validation that the original `CalcJob` would have done since that most likely will have defined specific outputs. Especially given that the `parse_from_node` method would often be used in unit tests to test `Parser` implementations, having the output validation be different would make it possible to mis bugs. For example, if the parser assigns an invalid output node, this would go unnoticed. The `parse_from_node` is updated to patch the output specification of the `calcfunction` with that of the process class that was used to created the `CalcJobNode` which is being passed as an argument. As long as the process can of course be loaded. This ensures that when the `calcfunction` return the outputs returned by `Parser.parse` they are validated against the output specification of the original `CalcJob` class. If it fails, a `ValueError` is raised. Config: Switch from `jsonschema` to `pydantic` (#6117) The configuration of an AiiDA instance is written in JSON format to the `config.json` file. The schema is defined using `jsonschema` to take care of validation, however, some validation, for example of the config options was still happening manually. Other parts of the code want to start using `pydantic` for model definition and configuration purposes, which has become the de-facto standard for these use-cases in the Python ecosystem. Before introducing another dependency, the existing `jsonschema` approach is replaced by `pydantic` in current code base first. Engine: Add the `wait` argument to `submit` For demos and tutorials, often in interactive notebooks, it is often preferred to use `run` instead of `submit` because in this way the cell will block until the process is done. The cell blocking will signal to the user that the process is still running and as soon as it returns it is immediately clear that the results are ready. With `submit` the cell returns immediately, but the user will now have to resort to manually checking when the process is done. Solutions are to instruct the user to call `verdi process list` manually (which they will have to do repeatedly) or implement some automated loop that checks for the process to terminate. However, using `run` has downsides as well, most notably that the process will be lost if the notebook gets disconnected. For processes that are expected to run longer, this can be really problematic, and so `submit` will have to be used regardless. Here, the `wait` argument is added to `submit`. Set to `False` by default to keep current behavior, when set to `True`, the function will mimic the behavior of `run` and only return when the process has terminated at which point the node is returned. A `REPORT` log message is emitted each time the state of the process is checked in intervals of `wait_interval`. Engine: Add the `await_processes` utility function The recent addition of the `wait` argument to the `submit` function allows a user to submit process the daemon, while still have the function block until the process is terminated, as a call to `run` would do. This can be useful in interactive tutorials and demos where the code should not avance until the process is done, but one still wants to benefits of having the daemon run the process. The downside of this approach is that it only allows to submit and wait for a single process at a time. Here the `await_processes` function is added. It takes a list of process nodes and will wait in a loop for all of them reach a terminal state. The time between iterations can be controlled by the `wait_interval` argument. CLI: Keep list unique in `verdi config set --append` (#6162) When calling the command multiple times for the same value, it would be added multiple times to the list, even though in all cases a unique list would be expected. CLI: Fix `verdi config set` when setting list option (#6166) `verdi config set` would except when setting a single value for an option that is of list type, such as `caching.enable_for`. This only started happening after the recent move to `pydantic` for the configuration options. Now the `Option.validate` will correctly raise when trying to set a string value for a list type. The `verdi config set` implementation is updated to check when it is setting a value for an option with a list type, and in that case, the value is wrapped in a list, unless the `--append` or `--remove` flags are specified. Docker: Pass environment variable to aiida-prepare script (#6169) Set `with-contenv` such that environment variables are forwarded. Without this, settings like the work dir of `localhost` will be set incorrectly and will cause calculations to fail. Dependencies: Update to `sqlalchemy~=2.0` (#6146) A number of minor changes were required for the update: * Queries that use `order_by` now need to include the property that is being ordered on in the list of projections. * The `Session.bulk_update_mappings` and `Session.bulk_insert_mappings` are replaced by using `Session.execute` with the `update` and `insert` methods. * The `sqlalchemy-utils` dependency is no longer used as well as the `tests/storage/psql_dos/test_utils.py` file that used it. * The `future=True` is removed from the engine creation. This was a temporary flag to enable v2.0 compatibility with v1.4. * Test of schema equivalence for export archives needed to be updated since the casting of `UUID` columns for PostgreSQL changed. * Remove the `sphinx-sqlalchemy` dependency since it is not compatible with `sqlalchemy~=2.0`. The documentation that relied on it to show the database models is temporarily commented out. Docker: Add folders that automatically run scripts before/after daemon start (#6170) In order to simplify the implementation of using the `aiida-core` image as the base for customized images, the `run-before-daemon-start` and `run-after-daemon-start` script folders are created. Any executables in these two folders will be executed before and after the AiiDA daemon is started in the container, respectively. The standard linux `run-parts` tool is used to scan these folders for files, which are run in the lexical sort order of their names, according to the C/POSIX locale character collation rules `Config`: Add the `create_profile` method This method takes a name and storage backend class, along with a dictionary of configuration parameters, and creates a profile for it, initializing the storage backend. If successful, the profile is added to the config and it is saved to disk. It is the `Config` class that defines the "structure" of a profile configuration and so it should be this class to takes care of generating this configuration. The storage configuration is the exception, since there are multiple options for this, where the `StorageBackend` plugin defines the structure of the required configuration dictionary. This method will allow to remove all places in the code where a new profile and its configuration dictionary is built up manually. CLI: Add the command `verdi profile setup` This command uses the `DynamicEntryPointCommandGroup` to allow creating a new profile with any of the plugins registered in the `aiida.storage` group. Each storage plugin will typically require a different set of configuration parameters to initialize and connect to the storage. These are generated dynamically from the specification returned by the method `get_cli_options` defined on the `StorageBackend` base class. Each plugin implements the abstract `_get_cli_options` method which is called by the former and defines the configuration parameters of the plugin. The values passed to the plugin specific options are used to instantiate an instance of the storage class, registered under the chosen entry point which is then initialised. If successful, the new profile is stored in the `Config` and a default user is created and stored. After that, the profile is ready for use. `DynamicEntryPointCommandGroup`: Use `pydantic` to define config model The `DynamicEntryPointCommandGroup` depends on the entry point classes to implement the `get_cli_options` method to return a dictionary with a specification of the options to create. The schema of this dictionary was a custom ad-hoc solution for this purpose. Here we switch to using pydantic's `BaseModel` to define the `Config` class attribute which defines the schema for the configuration necessary to construct an instance of the entry points class. ORM: Add the `User.is_default` property This is a useful shortcut to determine whether a `User` instance is the current default user. The previous way of determing this was to retrieve the default user from the collection `User.collection.get_default()` and manually compare it with the `User` instance. CLI: Improve the formatting of `verdi user list` Uses the `tabulate` package to create a nicely formatted table as is used in many other `verdi` commands already. The results are ordered by the users emails. Manager: Add the `set_default_user_email` Each profile can define which user in its storage backend should be considered the default. This is necessary because each ORM entity, when created, needs to specify a `User` object and we don't want the user to always have to explicitly define this manuallly. The default user for a profile is stored in the configuration file by the email of the `User` object. However, in a loaded storage backend, this default user is also cached as the `User` object. This means that when the default user is changed, it should be changed both in the configuration file, but if a storage backend is loaded, the cache should also be invalidated, such that the next time the default user is requested, the new one is properly loaded from the database. Since this change affects both the configuration as well as the currently loaded storage, the `set_default_user_email` is added to the `Manager` class, since that controls both. It calls through to the same method on the `Config` class, which is responsible for updating the `Config` instance in memory and writing the changes to disk. Then the manager resets the default user on the storage backend, if any is loaded. The `verdi user set-default` command is updated to use the new method. A test is added for the command, which didn't exist yet. The command is updated to use `Manager.set_default_user_email` even though it could use `Config.set_default_user_email` since the Python interpreter will shut down immediately after anyway. However, the test would fail if the latter would be used, since the loaded storage backend would not have been updated, which is used by `User.collection.get_default()`. This demonstrates why in active Python interpreters only the method on the manager should be used. A warning is added to the docstring on the configuration class. CLI: Reuse options in `verdi user configure` from setup This way it is guaranteed that the same types are being used, which were actually different. The `--set-default` flag now also gets its default from the current value on the selected user, just as is done for the other properties. CLI: Set defaults for user details in profile setup The user options `--first-name`, `--last-name` and `--institution` in the `verdi quicksetup/setup` commands were recently made required but did not provide a default. This would make creating profiles significantly more complex than always needed. For simple test and demo profiles the user might not necessarily care about these user details. Here we add defaults for these options. Even for production profiles this is a sensible approach since these details can always be freely updated later on with `verdi user configure`. This is also the reason that the `--email` does not provide a default because that can not be changed later on. Devops: Trigger Docker image build when pushing to `support/*` branch (#6175) Dependencies: Add support for Python 3.12 The file `requirements/requirements-py-3.12.txt` is added which provides a complete environment for Python 3.12. The CI is updated to add Python 3.12 in all strategy matrices or replace Python 3.11 where only the oldest and latest Python version are tested. Note that the Python version for the `pre-commit` jobs are kept at 3.10 for now. The reason is that in Python 3.12 f-strings are improved by allowing nested quotes. For example: f'some_dict['key']' is now supported, whereas before Python 3.12 this would not work since the nested quotes would not be parsed correctly and the internal quotes had to be either escaped or changed for double quotes. A number of dependencies had to be updated to make them compatible with Python 3.12, usually because older version still relied on the `distutils` and `pkg_resources` standard lib modules which have been removed. The `utils/dependency_management.py` had to be updated similarly to replace `pkg_resources` with `packaging`. The latter had to be updated to `packaging==23.0` in order to have the `__eq__` implementation for the `Requirement` class which the script relies on. The memory leak tests are skipped on Python 3.12 because currently they hang. The problem is with the `pympler.muppy.get_objects` method. This calls `gc.collect` internally, but that call is blocking. The exact cause is as of yet unknown. The garbage collecting has been changed in Python 3.12 so it is not completely unexpected either. The `sphinxcontrib-details-directive` dependency is removed. It was used for the sphinx extension to add the ports of port namespaces in HTML's `<details>` tags, allowing them to be collapsed. This could help with readability in case of large namespaces. However, this package breaks on Python 3.12 since it imports the deprecated `pkg_resources` package. Since the package has not been maintained since 4 years, it is unlikely this will be fixed it and so instead it is removed for now. See https://github.com/sphinx-contrib/sphinxcontrib-details-directive Dependencies: Restore `sphinx-sqlalchemy` This dependency was temporarily removed since it didn't yet support sqlalchemy v2, but that has now been released with `v0.2.0`. Add the `SqliteDosStorage` storage backend The implementation subclasses the `PsqlDosBackend` and replaces the PostgreSQL database with an sqlite database. By doing so, the initialization of the storage only requires a directory on the local file system where it will create the sqlite file for the database and a container for the disk-objectstore. The advantage of this `sqlite_dos` storage over the default `psql_dos` is that it doesn't require a system service like PostgreSQL. As a result, creating the storage is very straightforward and can be done with almost no setup. The advantage over the existing `sqlite_zip` is that the `sqlite_dos` is not read-only but can be used to write data as well. Combined with the `verdi profile setup` command, a working profile can be created with a single command: verdi profile setup core.sqlite_dos -n --profile name --email e@mail This makes this storage backend very useful for tutorials and demos that don't rely on performance. `SqliteZipBackend`: Return `self` in `store` The `store` method of the `SqliteEntityOverride` class, used by the `SqliteZipBackend` storage backend (and with that all other backends to subclass this), did not return `self`. This is in conflict with the signature of the base class that it is overriding. Since the `SqliteZipBackend` is read-only and so `store` would never be called, this problem went unnoticed. However, with the addition of the `SqliteDosStorage` backend which is *not* read-only, this bug would surface when trying to store a node since certain methods rely on this method returning the node instance itself. Fix `QueryBuilder.count` for storage backends using sqlite The storage backends that use sqlite instead of PostgreSQL, i.e., `core.sqlite_dos`, `core.sqlite_temp` and `core.sqlite_zip`, piggy back of the ORM models defined by the `core.psql_dos` backend by dynamically converting to the sqlite equivalent database models. The current implementation of `SqlaGroup.count` would except when used with an sqlite backend since certain columns would be ambiguously defined: sqlite3.OperationalError: ambiguous column name: db_dbgroup.id This is fixed by explicitly wrapping the classes that are joined in `sqlalchemy.orm.aliased` which will force sqlalchemy to properly alias each class removing the ambiguity. Tests: Remove deprecated `aiida/manage/tests/main` module This module had been deprecated and replaced a long time ago in favor of `pytest` based fixtures that provide a complete testing environment with test profiles being created on-the-fly. Tests: Move ipython magic tests to main unit test suite The `.github/system_tests/test_ipython_magics.py` file provided tests for the ipython magics, however, these can simply be run in the main test suite invoked directly through `pytest`. Tests: Move memory leak tests to main unit test suite The `.github/system_tests/pytest/test_memory_leaks.py` file provided tests to ensure memory is not being leaked when running processes. These tests do not require being executed in standalone `pytest` invocation but can be included in the main unit test suite. Historically, the separation was required when the main unit test suite was not fully using `pytest` yet but used a framework based on `unittest`. With this migration, the last test in the `.github/workflows/tests.sh` script has been moved and now it merely calls the main test suite. The CI workflows that called it, now simply directly invoke the command to run the main test suite and the `tests.sh` script is deleted. Pre-commit: Disable `no-member` and `no-name-in-module` for `aiida.orm` After the previous commit, for some unknown reason, `pylint` started throwing `no-member` and `no-name-in-module` warnings for import lines that import a class directly from `aiida.orm`. The imports actually work just fine and `pylint` didn't use to complain. The changes of the previous commit seem completely unrelated, so for now the warnings are ignored. Soon `pylint` will anyway be replaced by `ruff`. Docs: Various minor fixes to `run_docker.rst` (#6182) Some typos were reported by users. Dependencies: Update requirement `mypy~=1.7` (#6188) This allows to get rid of many exclude statements since those corresponded to bugs in `mypy` that have now been fixed. Add the `report` method to `logging.LoggerAdapter` (#6186) AiiDA defines the `REPORT` log level and adds the `report` method to the `logging.Logger` class so a log message can easily be emitted at that level. However, the logger of `Process` instances is a `LoggerAdapter` which does not inherit from `Logger` so the method also needs to be added there independently. Without this fix, calling `self.logger.report` in the `Parser.parse` method would raise an `AttributeError`. Docker: Add `rsync` and `graphviz` to system requirements The former is used for the backup functionality and the latter is needed to generate graphic representations of provenance graphs. Dependencies: Add upper limit `jedi<0.19` Certain tab completion functionality in ipython shells, for example the completion of `Node.inputs`, was broken for `jedi==0.19` in combination with recent version of `ipython`. Docker: Disable the consumer timeout for RabbitMQ (#6189) As of RabbitMQ v3.8.15, a default `consumer_timeout` is set of 30 minutes. If a task is not acknowledged within this timelimit, the consumer of the task is considered dead and its tasks are rescheduled. This is problematic for AiiDA since tasks often take multiple hours even. The `consumer_timeout` can only be changed on through the server config. Here we disable it through the `advanced.config`. Typing: Add overload signatures for `get_object_content` Added for the `FolderData` and `NodeRepository` classes. Typing: Add overload signatures for `open` Added for the `FolderData` and `NodeRepository` classes. The signature of the `SinglefileData` was actually incorrect as it defined: t.Iterator[t.BinaryIO | t.TextIO] as the return type, but which should really be: t.Iterator[t.BinaryIO] | t.Iterator[t.TextIO] The former will cause `mypy` to raise an error. Docs: Add changes of v2.4.1 to `CHANGELOG.md` Docs: Update citation suggestions (#6184) ORM: Filter inconsequential warnings from `sqlalchemy` (#6192) Recently, the code was updated to be compatible with `sqlalchmey~=2.0` which caused a lot of warnings to be emitted. As of `sqlalchemy==2.0.19` the `sqlalchemy.orm.unitofwork.UOWTransaction.register_object` method emits a warning whenever an object is registered that is not part of the session. See for details: https://docs.sqlalchemy.org/en/20/changelog/changelog_20.html#change-53740fe9731bbe0f3bb71e3453df07d3 This can happen when the session is committed or flushed and an object inside the session contains a reference to another object, for example through a relationship, is not explicitly part of the session. If that referenced object is not already stored and persisted, it might get lost. On the other hand, if the object was already persisted before, there is no risk. This situation occurs a lot in AiiDA's code base. Prime example is when a new process is created. Typically the input nodes are either already stored, or stored first. As soon as they get stored, the session is committed and the session is reset by expiring all objects. Now, the input links are created from the input nodes to the process node, and at the end the process node is stored to commit and persist it with the links. It is at this point that Sqlalchemy realises that the input nodes are not explicitly part of the session. One direct solution would be to add the input nodes again to the session before committing the process node and the links. However, this code is part of the backend independent :mod:`aiida.orm` module and this is a backend-specific problem. This is also just one example and there are most likely other places in the code where the problem arises. Therefore, as a workaround, a warning filter is put in place to silence this particular warning. Note that `pytest` undoes all registered warning filters, so it has to be added again in the `pytest` configuration in the `pyproject.toml`. Add the `aiida.common.log.capture_logging` utility The `capture_logging` is a context manager that yields a stream in memory to which all content written to the specified logger is duplicated. This does not interfere with any existing logging handlers whatsoever and so is non-destructive. It is useful to capture any output that is logged into memory in order to be able to act on it. CLI: Add the `verdi process repair` command This command replaces `verdi devel rabbitmq tasks analyze`. This command was added to the `verdi devel` namespace because it is working around a problem and it was experimental. Since then, it has proved really efficient and so should be made more directly available to users in case of stuck processes. The implementation is moved to `verdi process repair` and the original command simply forwards to it, while emitting a message that it is deprecated. While the original command would not do anything by default and the `--fix` flag had to be explicitly specified, this behavior is inverted for `verdi process repair`. By default it will fix inconsistencies and the `--dry-run` flag can be used to have to old behavior of just detecting them. CLI: Add repair hint to `verdi process play/pause/kill` If a process' task was lost, the `verdi process play/pause/kill` commands will report the error: Error: Process<****> is unreachable. If at least one of the processes is reported to be unreachable, the commands now log a message that suggests the user to run the command `verdi process repair` to repair all processes whose tasks were lost. Docs: Add changes of v2.4.2 to `CHANGELOG.md` Add support for `NodeLinksManager` to YAML serializer (#6199) The `Node.inputs` and `Node.outputs` properties return instances of the `aiida.orm.utils.managers.NodeLinksManager` class. Support is added to the `aiida.orm.utils.serialize` YAML serializers such that these instances can now be stored in the context of `WorkChains` as these are serialized to YAML for the checkpoints. Process functions: Fix bug with variable arguments (#6201) The process function implementation contained a bug where a function that specified variable positional arguments followed by keyword arguments would not be accepted. For example: def function(*args, arg_a, arg_b): pass function(*(1, 2), 3, 4) is a perfectly valid function definition and call but it would not work when decorated into a process function. Part of the problem was that the class argument `_varargs` of the dynamically constructed `FunctionProcess` was used for the name of variable positional as well as keyword arguments. If both were defined, the former would be overridden by the latter. This is now split in `_var_positional` and `_var_keyword` respectively. The conversion of the original positional and keyword arguments passed to the function into the process input dictionary is simplified. As well as for the reverse process where the process inputs are converted back in to positional and keyword arguments before passing them to the wrapped function. ORM: Implement the `Dict.get` method (#6200) This makes the behavior of `Dict` identical to that of a plain `dict` with respect to this method. The `Dict` class inherited the `get` method from the `Entity` base class, but that has a completely different purpose that is not of interest for users of the `Dict`…
In #6140 we've tried to speed up verdi invocation by lazy loading config / profile. Unfortunately, the configuration is still being loaded during tab-completion.
After fair amount of going through the code in both click and aiida, I now think this is a bug in click, see pallets/click#2614. I've devised a fix that passes the click test suite so it seems that the current behavior is unintented.
I will submit a PR to click, but given the speedup that we stand to gain (~50ms) for such a time-sensitive thing as tab-completion, I think it is worth fixing here for the next aiida-core release, which I suspect will happen before the next click release.
I've verified that with this fix, the profile is indeed not being loaded during tab-completion, by stucking
raise ValueError
in theaiida.manage.configuration.get_config()
and observing that it raises on main and does not raise here during tab completion.It would be great to have a regression test for this, but I am not sure how to do it. Here's how click tests it: https://github.com/pallets/click/blob/main/tests/test_shell_completion.py