Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: Do not evaluate callable defaults during tab-completion #6144

Merged
merged 3 commits into from
Oct 15, 2023

Conversation

danielhollas
Copy link
Collaborator

@danielhollas danielhollas commented Oct 11, 2023

In #6140 we've tried to speed up verdi invocation by lazy loading config / profile. Unfortunately, the configuration is still being loaded during tab-completion.

After fair amount of going through the code in both click and aiida, I now think this is a bug in click, see pallets/click#2614. I've devised a fix that passes the click test suite so it seems that the current behavior is unintented.

I will submit a PR to click, but given the speedup that we stand to gain (~50ms) for such a time-sensitive thing as tab-completion, I think it is worth fixing here for the next aiida-core release, which I suspect will happen before the next click release.

I've verified that with this fix, the profile is indeed not being loaded during tab-completion, by stucking raise ValueError in the aiida.manage.configuration.get_config() and observing that it raises on main and does not raise here during tab completion.

It would be great to have a regression test for this, but I am not sure how to do it. Here's how click tests it: https://github.com/pallets/click/blob/main/tests/test_shell_completion.py

@@ -158,7 +158,7 @@ def get_default(self, ctx: click.Context, call: bool = True) -> t.Optional[t.Uni
if self._contextual_default is not None:
default = self._contextual_default(ctx)
else:
default = super().get_default(ctx)
default = super().get_default(ctx, call=call)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated fix

@danielhollas
Copy link
Collaborator Author

@sphuber this is ready for review

@sphuber sphuber self-requested a review October 11, 2023 19:12
Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @danielhollas . As for the test, maybe you could try something like the following:

# -*- coding: utf-8 -*-
###########################################################################
# Copyright (c), The AiiDA team. All rights reserved.                     #
# This file is part of the AiiDA code.                                    #
#                                                                         #
# The code is hosted on GitHub at https://github.com/aiidateam/aiida-core #
# For further information on the license, see the LICENSE.txt file        #
# For further information please visit http://www.aiida.net               #
###########################################################################
# pylint: disable=redefined-outer-name
"""Tests for the :mod:`aiida.cmdline.params.options.callable` module."""
import pytest

from click.shell_completion import ShellComplete

from aiida.cmdline.commands.cmd_verdi import verdi


def _get_completions(cli, args, incomplete):
    comp = ShellComplete(cli, {}, cli.name, '_CLICK_COMPLETE')
    return comp.get_completions(args, incomplete)


@pytest.fixture
def unload_config():
    """Temporarily unload the config by setting ``aiida.manage.configuration.CONFIG`` to ``None``."""
    from aiida.manage import configuration

    config = configuration.CONFIG

    try:
        configuration.CONFIG = None
        yield
    finally:
        configuration.CONFIG = config


@pytest.mark.usefixtures('unload_config')
def test_callable_default_resilient_parsing():
    """Test that tab-completion of ``verdi`` does not evaluate defaults that load the config, which is expensive."""
    from aiida.manage import configuration

    assert configuration.CONFIG is None
    [c.value for c in _get_completions(verdi, [], '')]
    assert configuration.CONFIG is None

This fails for the main branch as it should. If it passes on your branch, I would say this provides some assurance it is working as intended

aiida/cmdline/params/options/callable.py Outdated Show resolved Hide resolved
tests/cmdline/commands/test_setup.py Outdated Show resolved Hide resolved
@@ -145,6 +146,7 @@ def set_log_level(_ctx, _param, value):
'profile',
type=types.ProfileParamType(),
default=defaults.get_default_profile,
cls=CallableDefaultOption,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really the only option that has an expensive callable default?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the only place where the config/profile is loaded, all the other use the InteractiveOption where this is already handled. But you are right that there likely are other expensive defaults, but I plan to look into this in a followup PR where I will also look at the timings more closely.

@danielhollas danielhollas force-pushed the fix/verdi-autocomplete branch from 631cf97 to e265067 Compare October 13, 2023 08:57
@sphuber
Copy link
Contributor

sphuber commented Oct 13, 2023

One more place that loaded the config

Seems like the test is doing its job 👍

Copy link
Collaborator Author

@danielhollas danielhollas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, indeed, thank you very much for the test! This is now ready from my side.

from aiida.manage import configuration

config = configuration.CONFIG
configuration.CONFIG = None
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: I have removed the try-except block, I don't think it is necessary, pytest should ensure that the fixture is run to completion after the test, unless the fixture itself excepts before the yield point, but here we only have two assignments.

https://docs.pytest.org/en/latest/how-to/fixtures.html#teardown-cleanup-aka-fixture-finalization

https://docs.pytest.org/en/latest/how-to/fixtures.html#safe-teardowns


assert configuration.CONFIG is None
completions = [c.value for c in _get_completions(verdi, [], '')]
assert 'help' in completions
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that we do not test autocompletion anywhere else in the test suite. I'll try adding more tests in a separate PR, for now I added at least this simple assert.
(also to shutup pylint which was complaining about unassigned expression)

@danielhollas danielhollas requested a review from sphuber October 13, 2023 12:37
Comment on lines +116 to +117
if not _ctx.resilient_parsing:
configure_logging()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit surprised by this change. This function set_log_level is only assigned as the callback of the VERBOSITY option. I don't think this is supposed to be called during tab-completion anyway. I just tested this and it indeed doesn't seem to be called during tab-completion. Was this the part of the code that caused the new test to fail? Do you understand why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, sorry for not being clear, I am also confused, but you can try when you remove it the test fails. But when I test the completion on the actual command line it is not called. Maybe the click function used in the test is not exactly the one that gets called?? Btw: I was testing on BASH, wonder if other shells may behave differently.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps there is some weird interaction with the test suite. Not sure if it is worth deeper investigation, since the change itself seems like an okay thing to do on its own.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Figured it out. I tested whether this was being called when actually tab-completing verdi by adding a print statement. Since that print statement never showed up, I concluded that the function wasn't being called. But that is not true. It was actually called, but during tab-completion, all output to sys.stdout is captured and so I didn't see anything. Printing to sys.stderr would actually show, or simply raising an exception would confirm the function was being called.

Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @danielhollas

@sphuber sphuber merged commit 0620588 into aiidateam:main Oct 15, 2023
@danielhollas danielhollas deleted the fix/verdi-autocomplete branch October 15, 2023 15:14
khsrali pushed a commit to khsrali/aiida-core that referenced this pull request Jan 16, 2025
Update test_execmanager.py

`CalcJobNode`: Fix validation for `depth=None` in `retrieve_list` (#6078)

Commit a1b9f79a97c5e987aa900c1db3258339abaa6aa3 added support for using
`None` as the third element in a directive of the `retrieve_list` of a
`CalcJob`. However, the method `_validate_retrieval_directive` that
validates the retrieve list directives when stored on the `CalcJobNode`
was not updated and would only still accept integers.

update run methods

CLI: Fix bug in `verdi data core.trajectory show` for various formats (#5394)

These minor bugs went unnoticed because the methods are wholly untested.
This is partly because they rely on additional Python modules or external
executables. For the formats that rely on external executables, i.e.,
`jmol` and `xcrysden`, the `subprocess.check_output` function is
monkeypatched to prevent the actual executable from being called. This
tests all code except for the actual external executable, which at least
gives coverage of our code.

The test for `mpl_pos` needed to be monkeypatched as well. This is
because the `_show_mpl_pos` method calls `plot_positions_xyz` which
imports `matplotlib.pyplot` and for some completely unknown reason, this
causes `tests/storage/psql_dos/test_backend.py::test_unload_profile` to
fail. For some reason, merely importing `matplotlib` (even here directly
in the test) will cause that test to claim that there still is something
holding on to a reference of an sqlalchemy session that it keeps track
of in the `sqlalchemy.orm.session._sessions` weak ref dictionary. Since
it is impossible to figure out why the hell importing matplotlib would
interact with sqlalchemy sessions, the function that does the import is
simply mocked out for now.

Co-authored-by: Sebastiaan Huber <[email protected]>

ORM: Check nodes are from same backend in `validate_link` (#5462)

Tests: Fix `StructureData` test breaking for recent `pymatgen` versions (#6088)

The roundtrip test for the `StructureData` class using `pymatgen`
structures as a go between started failing. The structure is constructed
from a CIF file with partial occupancies. The `label` attribute of each
site in the pymatgen structure, as returned by `as_dict` would look like
the following, originally:

    ['Bi', 'Bi', 'Te:0.667, Se:0.333', 'Te:0.667, Se:0.333', 'Te:0.667, Se:0.333']
    ['Bi', 'Bi', 'Te:0.667, Se:0.333', 'Te:0.667, Se:0.333', 'Te:0.667, Se:0.333']

In commit 63bbd23b57ca2c68eaca07e4915a70ef66e13405, released with
v2023.7.14, the CIF parsing logic in `pymatgen` was updated to include
parsing of the atom site labels and store them on the site `label`
attribute. This would result in the following site labels for the
structure parsed directly from the CIF and the one after roundtrip
through `StructureData`:

    ['Bi', 'Bi', 'Se1', 'Se1', 'Se1']
    [None, None, None, None, None]

The roundtrip returned `None` values because in the previously mentioned
commit, the newly added `label` property would return `None` instead of
the species label that used to be returned before. This behavior was
corrected in commit 9a98f4ce722299d545f2af01a9eaf1c37ff7bd53 and released
with v2023.7.20, after which the new behavior is the following:

    ['Bi', 'Bi', 'Se1', 'Se1', 'Se1']
    ['Bi', 'Bi', 'Te:0.667, Se:0.333', 'Te:0.667, Se:0.333', 'Te:0.667, Se:0.333']

The site labels parsed from the CIF are not maintained in the roundtrip
because the `StructureData` does not store them. Therefore when the final
pymatgen structure is created from it, the `label` is `None` and so
defaults to the species name.

Since the label information is not persisted in the `StructureData` it
is not guaranteed to be maintained in the roundtrip and so it is
excluded from the test.

Devops: Update pre-commit requirement `flynt==1.0.1` (#6093)

Docs: Fix typo in `run_codes.rst` (#6099)

Improve type hinting for `aiida.orm.nodes.data.singlefile`

`SinglefileData`: Add `mode` keyword to `get_content`

This allows a user to retrieve the content in bytes. Currently, a user
is forced to use the more elaborate form:

    with singlefile.open(mode='rb') as handle:
        content = handle.read()

or go directly through the repository interface which is a bit hidden
and requires to redundantly specify the filename:

    content = singlefile.base.repository.get_object_content(
	singlefile.filename,
	mode='rb'
    )

these variants can now be simplified to:

    content = singlefile.get_content('rb')

`RemoteData`: Add the `is_cleaned` property (#6101)

This is a convenience method that will return the `KEY_EXTRA_CLEANED`
extra, which is set to `True` when the `clean` method is called. The
`is_empty` method is also updated to use this new property and shortcut
if set to `True`. This saves the method from having to open a transport
connection.

Docs: Add links about "entry point" and "plugin" to tutorial (#6095)

The tutorial was missing an explanation of where the entry point for the workflow came from, and how users can write their own plugins and make them available via an entry point.

---------

Co-authored-by: Leopold Talirz <[email protected]>
Co-authored-by: Jusong Yu <[email protected]>

Lazily define `_plugin_type_string` and `_query_type_string of `Node`

These class attributes require a look up whether the `Node` class has a
registered entry point which can have a non-negligible cost. These
attributes were defined in the `AbstractNodeMeta` class, which is the
metaclass of the `Node` class, which would cause this code to be
executed as soon as the class was imported.

Here, the `AbstractNodeMeta` metaclass is removed. The
`_plugin_type_string` and `_query_type_string` class attributes are
changed to class properties. The actual value is stored in the private
attribute analog which is defined lazily the first time the property is
accessed.

Lazily define `__type_string` in `orm.Group`

This is a follow-up on previous commit aiming to speedup the import of
the `aiida.orm` by avoiding costly entry point lookups.

Here we completely remove the `GroupMeta` metaclass and move its logic
into the `_typestring` classproperty, which avoids the code being
executed on import while being backwards compatible.

Do not import `aiida.cmdline` in `aiida.orm`

Remove `with_dbenv` use in `aiida.orm`

This forces the import of `aiida.cmdline` in `aiida.orm` which doesn't
just slow down, but also is conceptually wrong. The problem of the
`with_dbenv` decorator is also that it cannot be imported inside a
method to avoid the import cost when importing `aiida.orm` but has to be
imported at the top in order to be used.

Docs: Improvements to sections containing recently added functionality (#6090)

* Daemon API
 * Processes API
 * Multiple profile serving for REST API
 * Controlling MPI when creating a `Code`

Devops: Update `pyproject.toml` configuration (#6085)

Added stricter rules for `mypy` and `pytest`. Suggestions taken after
automated analysis by the following tool:
https://learn.scientific-python.org/development/guides/repo-review/

Caching: Try to import an identifier if it is a class path

Raise a `ValueError` if the identifier cannot be imported, which will
help prevent accidental typos from appearing that the caching
configuration is being ignored.

Caching: Add the `strict` argument configuration validation

So far, the caching configuration validation only considered whether the
defined identifiers were valid syntactically. This made it possible for
a user to specify a valid identifier but that didn't actually match a
class that can be imported or an entry point that cannot be loaded. If
this is due to a typo, the user may be confused why the caching config
seems to be ignored.

The caching control functionality adds the `strict` argument, which when
set to `True`, besides checking the syntax validity of an identifier,
will also try to import/load it and raise a `ValueError` if it fails. By
default it is set to `False` to maintain backwards compatibility.

CLI: Remove loading backend for `verdi plugin list`

This command doesn't need the storage backend and loading it adds
significant unnecessary run time.

CLI: Add missing entry point groups for `verdi plugin list`

The following groups were not part of the entry point group mapping:

 * `aiida.calculations.monitors`
 * `aiida.calculations.importers`

This made that they were not available as subcommands to
`verdi plugin list`.

Refactor: Delay import of heavy packages to speed up import time (#6106)

The importing of packages `disk_objectstore`, `jsonschema`, `requests`,
`plumpy` and `paramiko` are moved from top-level to inside the scopes
where they are needed. This significantly improves the load time of the
`aiida` package and its subpackages.

The `ProcessLauncher` utility had to be removed from resources that are
exposed at a higher package level because it required the import of
`plumpy` which has a non-negligible import time. This is a breaking
change but it is not expected to be used outside of `aiida-core`.

Tests: Fix flaky work chain tests using `recwarn` fixture (#6112)

The tests were often failing because the `recwarn` fixture contained
two records instead of one. The reason is that elsewhere in the code a
`ResourceWarning` is emitted because an event-loop is not closed when
another one is created. Until this is fixed, the assertion is updated to
not check for the number of warnings emitted, but specifically to check
the expected warning message is present.

ORM: Add the `Entity.get_collection` classmethod

This is an alternative for the `collection` class property. Where the
`collection` property uses the current default storage backend, the
`get_collection` allows to specify another specific backend.

The original design intended to support this use-case by allowing a
`Collection` instance to be "called" with a specific backend:

    Entity.collection(backend)

The `.collection` returns a `Collection` instance which is then called
with passing the `backend`, which would return a new `Collection` for
the same entity type, but with the other backend. However, this doesn't
always work, because the `collection` property will load the backend
from the default profile, which will except if not loaded. Although this
scenario is unlikely for normal usage, application developers may use
AiiDA's API with multiple active backends where no default profile is
defined.

The reason for a new method instead of changing the `collection`
property is that that would be backwards incompatible.

ORM: Replace `.collection(backend)` with `.get_collection(backend)`

The classproperty will try to load the default storage backend first,
before recreating the collection with the specified backend. Not only is
this inefficient as the collection is recreated if the `backend` is not
the current default one, but it can also fail in situations where there
is no default profile is available and the caller wants to directly
specify the backend.

ORM: Explicitly pass backend when constructing new entity

Whenever an ORM entity instance instantiates another entry, it should
explicitly pass its own backend as the storage backend to use. Similarly,
functions that accept a storage backend as an argument, should
consistently pass whenever instantiating a new entity or its collection.

Co-Authored-By: Riccardo Bertossa <[email protected]>

Docs: Add links to Discourse server (#6111)

The Discourse server replaces the mailing list and Slack channel.
The README and documentation is updated accordingly.

Add type hinting for `aiida.orm.nodes.data.array.array`

`ArrayData`: Allow defining array(s) on construction

Currently, the constructor does not allow to define any arrays to set
when constructing a new node, so one is forced to multi line code:

    node = ArrayData()
    node.set_array('a', np.array([1, 2]))
    node.set_array('b', np.array([3, 4]))

This commit allows initialization upon construction simplifying the code
above to:

    node = ArrayData({'a': np.array([1, 2]), 'b': np.array([3, 4])})

Note that it is also possible to pass a single array to the constructor,
in which case the array name is taken from the `default_array_name`
class attribute.

For backwards compatibility, it remains possible to construct an
`ArrayData` without any arrays.

`ArrayData`: Make `name` optional in `get_array`

The `ArrayData` was designed to be able to store multiple numpy arrays.
While useful, it forced users to be more verbose than necessary when
only storing a single array as an explicit array name is always required:

    node = ArrayData()
    node.set_array('some_key', numpy.array([]))
    node.get_array('some_key')

The `get_array` method is updated to allow `None` for the `name`
argument as long as the node only stores a single array so that it can
return the correct array unambiguously. This simplifies typical user
code significantly:

    node = ArrayData(numpy.array([]))
    node.get_array()

Add type hinting for `aiida.orm.nodes.data.array.xy`

`XyData`: Allow defining array(s) on construction

Currently, the constructor does not allow to define any arrays to set
when constructing a new node, so one is forced to multi line code:

    node = XyData()
    node.set_x(np.array([1, 2]), 'name', unit')
    node.set_y(np.array([3, 4]), 'name', unit')

This commit allows initialization upon construction simplifying the code
above to:

    node = XyData(
        np.array([1, 2]),
        np.array([3, 4]),
        x_name='name',
        x_unit='unit',
        y_names='name',
        y_units='unit'
    )

The units and names are intentionally made into keyword argument only
in order to prevent accidental swapping of values.

For backwards compatibility, it remains possible to construct an
`XyData` without any arrays.

Tests: Make `PsqlDosStorage` profile unload test more robust (#6115)

The `test_unload_profile` test verifies that if a loaded profile is
unloaded, it properly relinquishes of the session that is maintained by
sqlalchemy. It did so by checking that after unloading, there were no
sessions being referenced. However, this would fail sometimes, because
another session may still be held on to, even though that session had
nothing to do with the test.

A more robust test is simply to check that after unloading, there is
exactly one less session being held on to.

Dependencies: Add compatibility for `pymatgen>=v2023.9.2` (#6109)

As of v2023.9.2, the ``properties`` argument of the `Specie` class is
removed and the ``spin`` argument should be used instead. See:
https://github.com/materialsproject/pymatgen/commit/118c245d6082fe0b13e19d348fc1db9c0d512019

The ``spin`` argument was introduced in v2023.6.28. See:
https://github.com/materialsproject/pymatgen/commit/9f2b3939af45d5129e0778d371d814811924aeb6

Instead of removing support for versions older than v2023.6.28 the code
is updated to be able to deal with the new version where `properties` is
no longer supported.

`BaseRestartWorkChain`: Factor out attachment of outputs (#5983)

When a work chain step returns an exit code, the work chain execution is
aborted. A common use for the handlers of the `BaseRestartWorkChain` is
exactly this, to stop work chain execution when a particular problem or
situation is detected.

The downside is that no other steps can be called by the work chain
implementation, for example, the `results` step to still attach any
(partial) results. Of course an implementation could copy the content of
the `results` method in the handler to do so, but it would have to copy
the contents in each handler that still wanted to attach the outputs,
duplicating the work.

Here, the actual attaching of the outputs is factored out of the
`results` method to the `attach_outputs` method. This method can now
easily be called inside a process handler that wants to attach outputs
before returning an exit code to stop the work chain.

`CalcJob`: Add support for nested targets in `remote_symlink_list` (#5974)

It is now possible to specify a target in the `remote_symlink_list` that
contains nested directories that do not necessarily exist. The
`upload_calculation` will automatically create them before creating the
symlink.

Improved Docker images (#6080)

The current Docker image provided with `aiida-core` depends on
`aiida-prerequisites` as a base image. This image is maintained outside
of the `aiida-core` repo, making it additional maintenance to keep it up
to date when a new `aiida-core` version is released. In addition, the
`aiida-prerequisites` image is no longer maintained because the AiiDAlab
stack now depends on another base image.

Finally, the `aiida-prerequisites` design had shortcomings as to how the
required services, PostgreSQL and RabbitMQ, are handled. They had to be
started manually and were not cleanly stopped on container shutdown.

An AEP was submitted to add two Docker images to `aiida-core` that
simplifies their maintenance and that improve the usability by properly
and automatically handling the services. See the AEP for more details:
https://aep.readthedocs.io/en/latest/009_improved_docker_images/readme.html

Docs: Correct example of `verdi config unset` in troubleshooting (#6118)

Devops: Upload artifact by PR from forks for docker workflow (#6119)

Refactor: Delay import of heavy packages to speed up import time (#6116)

The importing of packages `urllib`, `yaml` and `pgsu` are moved from
top-level to inside the scopes where they are needed. This significantly
improves the load time of the `aiida` package and its subpackages.

The import of `Transport` in the `aiida.engine` package also slows down
imports, but it is only used for type checking, so its import is placed
inside the `if TYPE_CHECKING` guard.

Finally, the `DEFAULT_DBINFO`, `Postgres` and `PostgresConnectionMode`
objects of the `aiida.manage.external.postgres` package are no longer
exposed on the top-level as this also slows down imports. This is a
breaking change technically, but these resources should not be used by
downstream packages.

Devops: Use Python 3.10 for `pre-commit` in CI and CD workflows (#6121)

Typing: Improve annotations of process functions (#6077)

Docs: Update `pydata-sphinx-theme` and add Discourse links (#6120)

The `pydata-sphinx-theme` is updated to `v0.13.3` or higher, which is
the latest release. This changes the style a bit, but mostly for the
better.

It also allows to use custom icon links in the top-right header. This is
used to add a link to the AiiDA home page (using the AiiDA icon) and add
a link to the new Discourse server. Another admonition box is added to
the landing page that directs users needing support to Discourse.

`InteractiveOption`: Fix validation being skipped if `!` provided

The `InteractiveOption` reserves the exclamation point `!` as a special
character in order to "skip" the option and not define it. This is
necessary for options that are _not_ required but that do specify a
defualt. Without this character, it is impossible to _not_ set the
default as, if the user doesn't specify anything specific and simply
presses enter, the default is taken, even if the user did not want to
specify any specific value.

The problem with the implementation, however, is that when `!` was
provided, the option would return `None` as the value, bypassing any
validation that might be defined. This would make it possible to bypass
the validation of a required option.

The solution is to, when `!` is provided for an interactive option, it
is translated to `None` and is then processed as normal, validating it
as any other value.

CLI: Usability improvements for interactive `verdi setup`

There were a number of ways that a user could break the command by
providing incorrect input that was not caught by validation:

 * The following options are now required and can no longer incorrectly
   be skipped with `!`: `user_email`, `user_first_name`, `user_last_name`
   `user_institution`, `db_engine`, `db_backend`, `db_host`, `db_port`
   and the `repository_path`.
 * For a missing parameter in interactive mode the error now reads:
      Error: Institution has to be specified
   Instead of:
      Error: Missing parameter: institution
   which should be more intuitive
 * The message that a profile has successfully been created is now only
   displayed if the storage backend initialized successfully. Before,
   this was shown before storage initialization, which could then still
   fail, making the success message confusing.

Devops: Replace outdated link in issue template (#6123)

It was still pointing to the legacy Google mailing list. The link is
updated to point to the Discourse server.

`SqliteTempBackend`: Add support for reading from and writing to archives (#5658)

To this end, the `bulk_insert` and `bulk_update` are implemented. The
archive creation and import functionality currently requires that the
repository of the storage backend uses a SHA256 hash for the keys of the
objects. This is not the case for the `SandboxRepositoryBackend` that
the `SqliteTempBackend` uses. Therefore, the `SandboxRepositoryBackend`
is subclassed to `SandboxShaRepositoryBackend` which replaces the UUID
key of its parent and uses a SHA256 instead.

Dependencies: Add new extra `tui` that provides `verdi` as a TUI (#6071)

The `tui` extra installs `trogon`. This package leverages `textual` to
turn `verdi`'s `click` interface into a Text-based User Interface (TUI).
It is added only if `trogon` is installed and can be imported. When it
is installed, it adds the `verdi tui` command, which launches the
text-based interface of `verdi`.

Co-authored-by: Sebastiaan Huber <[email protected]>

Docs: Add important note on using `iterall` and `iterdict` (#6126)

Using the `all` and `dict` equivalents are very inefficient for large
query results and will lead to performance problems.

CLI: Fix `repository` being required for `verdi quicksetup` (#6129)

Regression added by c53ea20a497f66bc88f68d0603cf9a32614fc4c2 which made
the `--repository` option for `verdi setup` required, as it should be.
However, it did so by making the base option required. The problem is
that the option for both `verdi setup` as well as `verdi quicksetup`
inherit from this, but for `verdi quicksetup` it should not be required
as the default will be populated automatically. As an alternative, the
option specific for `verdi setup` is now made required.

Devops: Follow-up docker build runner macOS-ARM64 (#6127)

The buildjet arm64 runner has only three-month trials, after that
we need to pay to use it. The self-hosted runner is deployed on
the macOS-arm64 machine located in PSI.

`PsqlDosBackend`: Fix `Node.store` excepting when inside a transaction (#6125)

Calling `Node.store` with the `PsqlDosBackend` would except whenever
inside a transaction, for example, when iterating over a `QueryBuilder`
result, which opens a transaction.

The reason is that the node implementation of the `PsqlDosBackend`, the
`SqlaNode.store` method calls `commit` on the session. This closes the
current transaction, and so when it is then used again, for example in
the next iteration of the builder results, an exception is raised by
sqlalchemy complaining that the transaction was closed.

The solution is that `SqlaNode.store` should only commit if it is not
inside a nested transaction, otherwise it should simply flush the
addition of the node to the session such that automatically generated
primary keys are populated.

A similar problem was addressed in the `add_nodes` and `remove_nodes`
methods of the `SqlaGroup` class. These would also call `commit` at the
end, regardless of whether they are called within an open transaction.

Devops: Loosen trigger conditions for Docker build CI workflow (#6131)

The docker build workflow was only activated when changes were made to
either the `.docker` directory or `.github/workflows/docker*.yaml` files.
However, changes in the `aiida` package could also break the build and
so could pass by unnoticed.

The trigger conditions are changed to instead trigger always except for
changes to the `tests` and `docs` directories.

Refactor: Replace `all` with `iterall` where beneficial (#6130)

Whenever a `QueryBuilder` result is used in a loop and the total result
is not fully stored in memory in some way, it is beneficial to use
`iterall` since that prevents loading everything in memory for no
reason.

📚 `README.md`: Add Discourse shield to header table (#6138)

Docs: Changes are reverted if exception during `iterall` (#6128)

An explicit test is added to guarantee that changes made while looping
over the result of `iterall` or `iterdict` are reverted if an exception
is raised and not caught before the end of the iterator. A note is added
to the how-to section of the `QueryBuilder`.

Devops: Update the `.devcontainer` to use the new docker stack (#6139)

DevOps: amendment use aiida-core-base image from ghcr.io (#6141)

Amendment to #6139, for unknown reason, docker pull is failed for docker.io on this repository. Using the docker registry ghcr.io works fine.

`PsqlDosBackend`: Fix changes not persisted after `iterall` and `iterdict` (#6134)

The `iterall` and `iterdict` generators of the `QueryBuilder`
implementation for the `PsqlDosBackend` would open a transaction in
order for the `ModelWrapper` to not automatically commit the session
upon any mutation as that would invalidate the cursor. However, it did
not manually commit the session at the end of the iterator, causing any
mutations to be lost when the storage backend was reloaded.

This problem was not just present in the `iterall` and `iterdict`
methods of the `QueryBuilder` but rather the `transaction` method of the
`PsqlDosBackend` never commits the savepoint that is returned by the
`Session.begin_nested` call. Now the `transaction` explicitly commits
the savepoint after the yield and the `QueryBuilder` methods are updated
to simply use the `transaction` method of the storage backend, rather
than going directly to the session.

This change also required a change in the `SqliteZipBackend`, since the
`transaction` is now called during archive creation and import, but the
backend raised a `NotImplementedError`. This is because it used to be a
read-only backend, however, this limitation was recently lifted. The
commit simply forgot to implement the `transaction` method.

Performance: Cache the lookup of entry points (#6124)

Entry points are looked up using the `entry_points` callable of the
`importlib_metadata` module. It is wrapped by the `eps` function in the
`aiida.plugins.entry_point` module. This call, and the `.select()` filter
that is used on it to find a specific entry point can be quite expensive
as it involves a double loop in the `importlib_metadata` code. Since it
is used throughout the `aiida-core` source code whenever an entry point
is looked up, this causes a significant slowdown of module imports.

The `eps` function now pre-sorts the entry points based on the group.
This guarantees that the entry points of groups starting with `aiida.`
come first in the lookup, giving a small performance boost. The result
is then cached so the sorting is performed just once, which takes on the
order of ~30 µs.

The most expensive part is still the looping over all entry points when
`eps().select()` is called. To alleviate this, the `eps_select` function
is added which simply calls through to `eps().select()`, but which allows
the calls to be cached.

In order to implement the changes, the `importlib_metadata` package,
which provides a backport implementation of the `importlib.metadata`
module of the standard lib, was updated to v6.0.

Docs: Update the image name for docker image (#6143)

It was still pointing to the old name instead of the new `aiida-core-with-services`.

CLI: Make loading of config lazy for improved responsiveness (#6140)

The `VerdiContext` class, which provides the custom context of the
`verdi` commands, loads the configuration. This has a non-negligible
cost and so slows down the responsiveness of the CLI. This is especially
noticeable during tab-completion.

The `obj` custom object of the `VerdiContext` is replaced with a
subclass of `AttributeDict` that lazily populates the `config` key when
it is called with the loaded `Config` class. In addition, the defaults
of some options of the `verdi setup` command, which load a value from
the config and so require the config, are turned into partials such that
they also are lazily evaluated. These changes should give a reduction in
load time of `verdi` of the order of ~50 ms.

A test of `verdi setup` had to be updated to explicitly provide a value
for the email. This is because now the default is evaluated lazily, i.e.
when the command is actually called in the test. At this point, there is
no value for this config option and so the default is empty. Before, the
default would be evaluated as soon as `aiida.cmdline.commands.cmd_setup`
was imported, at which point an existing config would still contain
these values, binding them to the default, even if the config would be
reset afterwards before the test.

Deprecation: `aiida.orm.nodes.data.upf` and `verdi data core.upf` (#6114)

The `UpfData` data plugin and related utilities have been replaced by
the versions maintained in the `aiida-pseudo` plugin. The latter has now
been significantly adopted by most users and plugin in the ecosystem, so
the outdated original version in `aiida-core` can be deprecated and
removed.

Tests: Print stack trace if CLI command excepts with `run_cli_command`

Before, the test would just fail and say that an exception was raised
but it would not display the actual exception, making it difficult to
debug the problem. In the case of a non-zero exit code, the stderr is
printed as well.

Config: Remove use of `NO_DEFAULT` for `Option.default`

The `Option.default` property would return the global constant
`NO_DEFAULT` in case the option does not specify a default. The idea was
that it could be used to distinguish between an option not defining a
default and one defining the default to be `None`.

The problem is that in the various methods that return a config option
value, such as `Config.get_option` could also return this value. This
would be problematic for certain CLI command options that used the
`aiida.manage.configuration.get_config_option` function to set the
default. If the config option was not defined, the function would return
`()` which is the value of the `NO_DEFAULT` constant. When the option
accepts string values, this value would often even silently be accepted
although it almost certainly is not what the user intended.

This would actually happen for the tests of `verdi setup`, which has the
options `--email`, `--first-name`, `--last-name` and `--institution`
that all define a default through `get_config_option` and therefore the
default would be actually set to `()` in case the config did not specify
these global config options.

Since for config options there is no current use-case for actually
setting a default to `None`, there is no need to distinguish between
this case and a default never having been defined, and so the `NO_DEFAULT`
global constant is removed and replaced by `None`.

Tests: Fix failing `tests/cmdline/commands/test_setup.py`

The previous commit fixed a bug in the evaluation of defaults for
various options of the `verdi setup` command. Due to the bug, these
options would set a default even if the corresponding config option was
not defined. Instead of no default being defined, the empty tuple `()`
would be set as string value.

As soon as the bug was fixed, the `test_setup_profile_uuid` test started
failing since it doesn't explicitly defined values for the options
`--email`, `--first-name`, `--last-name` and `--institution`.

ORM: `Sealable.seal()` return `self` instead of `None`

The `Sealable` mixin is used by the `ProcessNode` which allows it to be
sealed. By having the `seal` method return `self`, which will be the
`ProcessNode` instance, it brings the behavior on par with the `store`
method.

ORM: `ProcessNode.is_valid_cache` is `False` for unsealed nodes

When a `ProcessNode` is not yet sealed, it has not officially been
terminated. At this point it cannot yet be considered a valid cache
source. However, this condition was not considered by the property
`ProcessNode.is_valid_cache`.

This bug manifested itself in very rare situations where a race
condition could lead to a process being cached from an unsealed node.
When a node is stored from the cache, it copies all the attribute except
for the sealed key and adds the outputs. The sealing is then left to the
engine which will complete the cached process as if it had run normally.
The problem arises when the cache source was not yet sealed, and so the
outputs had not yet been added. The cached node will then miss the
output nodes.

CLI: Do not load config in defaults and callbacks during tab-completion (#6144)

The `get_default_profile` default of the `PROFILE` option and the
`set_log_level` callback of the `VERBOSITY` option both load the config.
Since defaults and callbacks are also evaluated during tab-completion
this was slowing down tab-completion significantly since loading the
config has a non-negligible cost.

The `set_log_level` callback is modified to explicitly check whether
we are are tab-completing, in which case `ctx.resilient_parsing` is set
to `True`. In this case, the functions now returns `None` and no longer
loads the config.

For `get_default_profile`, the `CallableDefaultOption` class is added
which allows the default to be made a callable, which will return `None`
if `ctx.resilient_parsing` is set to `True`.

`FolderData`: Expose repository API on top-level namespace (#6150)

In 8293e453789d0bad9cf631ecfc08542dd9ad892d, the `Node` interface was
refactored to move the API of the repository to the `base.repository`
namespace. The original methods would be forwarded with a deprecation
message being printed.

Although this made sense for most node types, in an effort to clean up
the node interface which was overpopulated, for the `FolderData` the
repository interface is the main interface and it doesn't make sense to
force the users to go all the way down to the nested `base.repository`
namespace to access it. Therefore the public API of the repository is
restored on the top-level namespace of the `FolderData` class.

Dependencies: Update to `disk-objectstore~=1.0` (#6132)

* Update `DiskObjectStoreRepositoryBackend.get_info` to use the
  dataclasses returned by `count_objects` and `get_total_size`
  directly.
* Change `DiskObjectStoreRepositoryBackend.maintain` to always call
  `clean_storage`, even during live operation of the container.
* Change `DiskObjectStoreRepositoryBackend.maintain` to now pass
  `CompressMode.AUTO` when `compress` is set to `True`.

Tests: Refactor transport tests from `unittest` to `pytest` (#6152)

Now detects all plugins using `get_entry_points` instead of manually parsing
module files to detect `Transport` plugins. Uses `pytest.mark.parametrize` to
run all tests for all registered transport plugins.

ORM: Register `numpy.ndarray` with the `to_aiida_type` to `ArrayData` (#6149)

This will allow `numpy.ndarray` to be passed to process inputs that add
the `to_aiida_type` serializer and expect an `ArrayData`. The single
dispatch will automatically convert the numpy array to an `ArrayData`
instance.

Repository: Add the `as_path` context manager (#6151)

The node repository interface intentionally does not provide access to
its file objects through filepaths on the file system. This is because,
for efficiency reasons, the content of a repository may not actually be
stored as individual files on a file system, but for example are stored
in an object store.

Therefore, the contents of the repository can only be retrieved as a
file-like object or read as a string or list of bytes into memory.
Certain use-cases require a file to be made available through a filepath.
An example is when it needs to be passed to an API that only accepts a
filepath, such as `numpy.loadfromtxt`.

Currently, the user will have to manually copy the content of the repo's
content to a temporary file on disk, and pass the temporary filepath.
This results in clients having to often resport to the following snippet:

    import pathlib
    import shutil
    import tempfile

    with tempfile.TemporaryDirectory() as tmp_path:

        # Copy the entire content to the temporary folder
        dirpath = pathlib.Path(tmp_path)
        node.base.repository.copy_tree(dirpath)

        # Or copy the content of a file. Should use streaming
        # to avoid reading everything into memory
        filepath = (dirpath / 'some_file.txt')
        with filepath.open('rb') as target:
            with node.base.repository.open('rb') as source:
                shutil.copyfileobj(source, target)

        # Now use `filepath` to library call, e.g.
        numpy.loadtxt(filepath)

This logic is now provided under the `as_path` context manager. This
will make it easy to access repository content as files on the local
file system. The snippet above is simplified to:

    with node.base.repository.as_path() as filepath:
        numpy.loadtxt(filepath)

The method is exposed directly in the interface of the `FolderData` and
`SinglfileData` data types. A warning is added to the docs explaining
the inefficiency of the content having to be read and written to a
temporary directory first, encouraging it only to be used when the
alternative is not an option.

ORM: Add `NodeCaching.CACHED_FROM_KEY` for `_aiida_cached_from` constant

The `_aiida_cached_from` key is used to store the UUID, of the node from
which a node was cached, into the extras. It appeared in a few places as
a string literal. It is now added as the `CACHED_FROM_KEY` class
variable of `NodeCaching`.

CLI: Add `cached` and `cached_from` projections to `verdi process list`

The `cached` projection will print a checkmark for nodes that were
cached from another node. The `cached_from` projection will show the
UUID of the cache source, if it exists.

The `cached` projection is added to the default projections for `verdi
process list`. The use of caching is becoming more prevalent, but often
users can still be surprised by certain behavior when processes are
taken from cache when they didn't expect it. By showing whether a
process was taken from cache or not by default in the output of `verdi
process list` should provide clarity since this is often the first place
that users check.

CLI: Lazily validate entry points in parameter types (#6153)

The command line has recently already been updated to lazily load entry
points in order to speed-up tab completion. Here, the validation of
entry points, which is to check whether a given entry point even exists,
is also delayed until the point where it is really necessary. This again
to keep tab-completion responsive, since even checking whether an entry
point exists has a non-negligible cost.

The `IdentifierParamType` and `PluginParamType` parameter types are
refactored to no longer validate entry points upon construction but
lazily the first time that they are actually invoked.

`Parser.parse_from_node`: Validate outputs against process spec (#6159)

The `parse_from_node` clasmethod is a utility function to call the
`parse` method of a `Parser` implementation for a given `CalcJobNode`.
It automatically calls `parse` in the correct manner, passing the
`retrieved` output node, and wrapping it in a calcfunction to optionally
store the provenance.

However, since the `calcfunction` by default has a dynamic output
namespace and so accepts all outputs, it would not perform the same
output validation that the original `CalcJob` would have done since that
most likely will have defined specific outputs. Especially given that
the `parse_from_node` method would often be used in unit tests to test
`Parser` implementations, having the output validation be different
would make it possible to mis bugs. For example, if the parser assigns
an invalid output node, this would go unnoticed.

The `parse_from_node` is updated to patch the output specification of
the `calcfunction` with that of the process class that was used to
created the `CalcJobNode` which is being passed as an argument. As long
as the process can of course be loaded. This ensures that when the
`calcfunction` return the outputs returned by `Parser.parse` they are
validated against the output specification of the original `CalcJob`
class. If it fails, a `ValueError` is raised.

Config: Switch from `jsonschema` to `pydantic` (#6117)

The configuration of an AiiDA instance is written in JSON format to the
`config.json` file. The schema is defined using `jsonschema` to take
care of validation, however, some validation, for example of the config
options was still happening manually.

Other parts of the code want to start using `pydantic` for model
definition and configuration purposes, which has become the de-facto
standard for these use-cases in the Python ecosystem. Before introducing
another dependency, the existing `jsonschema` approach is replaced by
`pydantic` in current code base first.

Engine: Add the `wait` argument to `submit`

For demos and tutorials, often in interactive notebooks, it is often
preferred to use `run` instead of `submit` because in this way the cell
will block until the process is done. The cell blocking will signal to
the user that the process is still running and as soon as it returns it
is immediately clear that the results are ready. With `submit` the cell
returns immediately, but the user will now have to resort to manually
checking when the process is done. Solutions are to instruct the user to
call `verdi process list` manually (which they will have to do
repeatedly) or implement some automated loop that checks for the process
to terminate.

However, using `run` has downsides as well, most notably that the
process will be lost if the notebook gets disconnected. For processes
that are expected to run longer, this can be really problematic, and so
`submit` will have to be used regardless.

Here, the `wait` argument is added to `submit`. Set to `False` by
default to keep current behavior, when set to `True`, the function will
mimic the behavior of `run` and only return when the process has
terminated at which point the node is returned. A `REPORT` log message
is emitted each time the state of the process is checked in intervals
of `wait_interval`.

Engine: Add the `await_processes` utility function

The recent addition of the `wait` argument to the `submit` function
allows a user to submit process the daemon, while still have the
function block until the process is terminated, as a call to `run` would
do. This can be useful in interactive tutorials and demos where the code
should not avance until the process is done, but one still wants to
benefits of having the daemon run the process.

The downside of this approach is that it only allows to submit and wait
for a single process at a time. Here the `await_processes` function is
added. It takes a list of process nodes and will wait in a loop for all
of them reach a terminal state. The time between iterations can be
controlled by the `wait_interval` argument.

CLI: Keep list unique in `verdi config set --append` (#6162)

When calling the command multiple times for the same value, it would be
added multiple times to the list, even though in all cases a unique list
would be expected.

CLI: Fix `verdi config set` when setting list option (#6166)

`verdi config set` would except when setting a single value for an
option that is of list type, such as `caching.enable_for`. This only
started happening after the recent move to `pydantic` for the
configuration options. Now the `Option.validate` will correctly raise
when trying to set a string value for a list type.

The `verdi config set` implementation is updated to check when it is
setting a value for an option with a list type, and in that case, the
value is wrapped in a list, unless the `--append` or `--remove` flags
are specified.

Docker: Pass environment variable to aiida-prepare script (#6169)

Set `with-contenv` such that environment variables are forwarded. Without
this, settings like the work dir of `localhost` will be set incorrectly and will cause
calculations to fail.

Dependencies: Update to `sqlalchemy~=2.0` (#6146)

A number of minor changes were required for the update:

* Queries that use `order_by` now need to include the property that is
  being ordered on in the list of projections.
* The `Session.bulk_update_mappings` and `Session.bulk_insert_mappings`
  are replaced by using `Session.execute` with the `update` and `insert`
  methods.
* The `sqlalchemy-utils` dependency is no longer used as well as the
  `tests/storage/psql_dos/test_utils.py` file that used it.
* The `future=True` is removed from the engine creation. This was a
  temporary flag to enable v2.0 compatibility with v1.4.
* Test of schema equivalence for export archives needed to be updated
  since the casting of `UUID` columns for PostgreSQL changed.
* Remove the `sphinx-sqlalchemy` dependency since it is not compatible
  with `sqlalchemy~=2.0`. The documentation that relied on it to show
  the database models is temporarily commented out.

Docker: Add folders that automatically run scripts before/after daemon start (#6170)

In order to simplify the implementation of using the `aiida-core` image
as the base for customized images, the `run-before-daemon-start` and
`run-after-daemon-start` script folders are created. Any executables in
these two folders will be executed before and after the AiiDA daemon is
started in the container, respectively.

The standard linux `run-parts` tool is used to scan these folders for
files, which are run in the lexical sort order of their names, according
to the C/POSIX locale character collation rules

`Config`: Add the `create_profile` method

This method takes a name and storage backend class, along with a
dictionary of configuration parameters, and creates a profile for it,
initializing the storage backend. If successful, the profile is added to
the config and it is saved to disk.

It is the `Config` class that defines the "structure" of a profile
configuration and so it should be this class to takes care of generating
this configuration. The storage configuration is the exception, since
there are multiple options for this, where the `StorageBackend` plugin
defines the structure of the required configuration dictionary.

This method will allow to remove all places in the code where a new
profile and its configuration dictionary is built up manually.

CLI: Add the command `verdi profile setup`

This command uses the `DynamicEntryPointCommandGroup` to allow creating
a new profile with any of the plugins registered in the `aiida.storage`
group. Each storage plugin will typically require a different set of
configuration parameters to initialize and connect to the storage. These
are generated dynamically from the specification returned by the method
`get_cli_options` defined on the `StorageBackend` base class. Each
plugin implements the abstract `_get_cli_options` method which is called
by the former and defines the configuration parameters of the plugin.

The values passed to the plugin specific options are used to instantiate
an instance of the storage class, registered under the chosen entry point
which is then initialised. If successful, the new profile is stored in
the `Config` and a default user is created and stored. After that, the
profile is ready for use.

`DynamicEntryPointCommandGroup`: Use `pydantic` to define config model

The `DynamicEntryPointCommandGroup` depends on the entry point classes
to implement the `get_cli_options` method to return a dictionary with a
specification of the options to create. The schema of this dictionary
was a custom ad-hoc solution for this purpose. Here we switch to using
pydantic's `BaseModel` to define the `Config` class attribute which
defines the schema for the configuration necessary to construct an
instance of the entry points class.

ORM: Add the `User.is_default` property

This is a useful shortcut to determine whether a `User` instance is the
current default user. The previous way of determing this was to retrieve
the default user from the collection `User.collection.get_default()` and
manually compare it with the `User` instance.

CLI: Improve the formatting of `verdi user list`

Uses the `tabulate` package to create a nicely formatted table as is
used in many other `verdi` commands already. The results are ordered by
the users emails.

Manager: Add the `set_default_user_email`

Each profile can define which user in its storage backend should be
considered the default. This is necessary because each ORM entity, when
created, needs to specify a `User` object and we don't want the user to
always have to explicitly define this manuallly.

The default user for a profile is stored in the configuration file by
the email of the `User` object. However, in a loaded storage backend,
this default user is also cached as the `User` object. This means that
when the default user is changed, it should be changed both in the
configuration file, but if a storage backend is loaded, the cache should
also be invalidated, such that the next time the default user is
requested, the new one is properly loaded from the database.

Since this change affects both the configuration as well as the
currently loaded storage, the `set_default_user_email` is added to the
`Manager` class, since that controls both. It calls through to the same
method on the `Config` class, which is responsible for updating the
`Config` instance in memory and writing the changes to disk. Then the
manager resets the default user on the storage backend, if any is
loaded.

The `verdi user set-default` command is updated to use the new method. A
test is added for the command, which didn't exist yet. The command is
updated to use `Manager.set_default_user_email` even though it could use
`Config.set_default_user_email` since the Python interpreter will shut
down immediately after anyway. However, the test would fail if the
latter would be used, since the loaded storage backend would not have
been updated, which is used by `User.collection.get_default()`. This
demonstrates why in active Python interpreters only the method on the
manager should be used. A warning is added to the docstring on the
configuration class.

CLI: Reuse options in `verdi user configure` from setup

This way it is guaranteed that the same types are being used, which were
actually different. The `--set-default` flag now also gets its default
from the current value on the selected user, just as is done for the
other properties.

CLI: Set defaults for user details in profile setup

The user options `--first-name`, `--last-name` and `--institution` in
the `verdi quicksetup/setup` commands were recently made required but
did not provide a default. This would make creating profiles
significantly more complex than always needed. For simple test and demo
profiles the user might not necessarily care about these user details.

Here we add defaults for these options. Even for production profiles
this is a sensible approach since these details can always be freely
updated later on with `verdi user configure`. This is also the reason
that the `--email` does not provide a default because that can not be
changed later on.

Devops: Trigger Docker image build when pushing to `support/*` branch (#6175)

Dependencies: Add support for Python 3.12

The file `requirements/requirements-py-3.12.txt` is added which provides
a complete environment for Python 3.12. The CI is updated to add Python
3.12 in all strategy matrices or replace Python 3.11 where only the
oldest and latest Python version are tested. Note that the Python
version for the `pre-commit` jobs are kept at 3.10 for now. The reason
is that in Python 3.12 f-strings are improved by allowing nested quotes.
For example:

    f'some_dict['key']'

is now supported, whereas before Python 3.12 this would not work since
the nested quotes would not be parsed correctly and the internal quotes
had to be either escaped or changed for double quotes.

A number of dependencies had to be updated to make them compatible with
Python 3.12, usually because older version still relied on the
`distutils` and `pkg_resources` standard lib modules which have been
removed. The `utils/dependency_management.py` had to be updated similarly
to replace `pkg_resources` with `packaging`. The latter had to be
updated to `packaging==23.0` in order to have the `__eq__`
implementation for the `Requirement` class which the script relies on.

The memory leak tests are skipped on Python 3.12 because currently they
hang. The problem is with the `pympler.muppy.get_objects` method. This
calls `gc.collect` internally, but that call is blocking. The exact
cause is as of yet unknown. The garbage collecting has been changed in
Python 3.12 so it is not completely unexpected either.

The `sphinxcontrib-details-directive` dependency is removed. It was used
for the sphinx extension to add the ports of port namespaces in HTML's
`<details>` tags, allowing them to be collapsed. This could help with
readability in case of large namespaces. However, this package breaks on
Python 3.12 since it imports the deprecated `pkg_resources` package.
Since the package has not been maintained since 4 years, it is unlikely
this will be fixed it and so instead it is removed for now. See
https://github.com/sphinx-contrib/sphinxcontrib-details-directive

Dependencies: Restore `sphinx-sqlalchemy`

This dependency was temporarily removed since it didn't yet support
sqlalchemy v2, but that has now been released with `v0.2.0`.

Add the `SqliteDosStorage` storage backend

The implementation subclasses the `PsqlDosBackend` and replaces the
PostgreSQL database with an sqlite database. By doing so, the
initialization of the storage only requires a directory on the local
file system where it will create the sqlite file for the database and a
container for the disk-objectstore.

The advantage of this `sqlite_dos` storage over the default `psql_dos`
is that it doesn't require a system service like PostgreSQL. As a
result, creating the storage is very straightforward and can be done
with almost no setup. The advantage over the existing `sqlite_zip` is
that the `sqlite_dos` is not read-only but can be used to write data
as well.

Combined with the `verdi profile setup` command, a working profile can
be created with a single command:

    verdi profile setup core.sqlite_dos -n --profile name --email e@mail

This makes this storage backend very useful for tutorials and demos
that don't rely on performance.

`SqliteZipBackend`: Return `self` in `store`

The `store` method of the `SqliteEntityOverride` class, used by the
`SqliteZipBackend` storage backend (and with that all other backends to
subclass this), did not return `self`. This is in conflict with the
signature of the base class that it is overriding.

Since the `SqliteZipBackend` is read-only and so `store` would never be
called, this problem went unnoticed. However, with the addition of the
`SqliteDosStorage` backend which is *not* read-only, this bug would
surface when trying to store a node since certain methods rely on this
method returning the node instance itself.

Fix `QueryBuilder.count` for storage backends using sqlite

The storage backends that use sqlite instead of PostgreSQL, i.e.,
`core.sqlite_dos`, `core.sqlite_temp` and `core.sqlite_zip`, piggy back
of the ORM models defined by the `core.psql_dos` backend by dynamically
converting to the sqlite equivalent database models.

The current implementation of `SqlaGroup.count` would except when used
with an sqlite backend since certain columns would be ambiguously
defined:

    sqlite3.OperationalError: ambiguous column name: db_dbgroup.id

This is fixed by explicitly wrapping the classes that are joined in
`sqlalchemy.orm.aliased` which will force sqlalchemy to properly alias
each class removing the ambiguity.

Tests: Remove deprecated `aiida/manage/tests/main` module

This module had been deprecated and replaced a long time ago in favor of
`pytest` based fixtures that provide a complete testing environment with
test profiles being created on-the-fly.

Tests: Move ipython magic tests to main unit test suite

The `.github/system_tests/test_ipython_magics.py` file provided tests
for the ipython magics, however, these can simply be run in the main
test suite invoked directly through `pytest`.

Tests: Move memory leak tests to main unit test suite

The `.github/system_tests/pytest/test_memory_leaks.py` file provided
tests to ensure memory is not being leaked when running processes. These
tests do not require being executed in standalone `pytest` invocation
but can be included in the main unit test suite. Historically, the
separation was required when the main unit test suite was not fully
using `pytest` yet but used a framework based on `unittest`.

With this migration, the last test in the `.github/workflows/tests.sh`
script has been moved and now it merely calls the main test suite. The
CI workflows that called it, now simply directly invoke the command to
run the main test suite and the `tests.sh` script is deleted.

Pre-commit: Disable `no-member` and `no-name-in-module` for `aiida.orm`

After the previous commit, for some unknown reason, `pylint` started
throwing `no-member` and `no-name-in-module` warnings for import lines
that import a class directly from `aiida.orm`. The imports actually work
just fine and `pylint` didn't use to complain. The changes of the
previous commit seem completely unrelated, so for now the warnings are
ignored. Soon `pylint` will anyway be replaced by `ruff`.

Docs: Various minor fixes to `run_docker.rst` (#6182)

Some typos were reported by users.

Dependencies: Update requirement `mypy~=1.7` (#6188)

This allows to get rid of many exclude statements since those
corresponded to bugs in `mypy` that have now been fixed.

Add the `report` method to `logging.LoggerAdapter` (#6186)

AiiDA defines the `REPORT` log level and adds the `report` method to the
`logging.Logger` class so a log message can easily be emitted at that
level. However, the logger of `Process` instances is a `LoggerAdapter`
which does not inherit from `Logger` so the method also needs to be
added there independently. Without this fix, calling `self.logger.report` in
the `Parser.parse` method would raise an `AttributeError`.

Docker: Add `rsync` and `graphviz` to system requirements

The former is used for the backup functionality and the latter is needed
to generate graphic representations of provenance graphs.

Dependencies: Add upper limit `jedi<0.19`

Certain tab completion functionality in ipython shells, for example the
completion of `Node.inputs`, was broken for `jedi==0.19` in combination
with recent version of `ipython`.

Docker: Disable the consumer timeout for RabbitMQ (#6189)

As of RabbitMQ v3.8.15, a default `consumer_timeout` is set of 30 minutes.
If a task is not acknowledged within this timelimit, the consumer of the
task is considered dead and its tasks are rescheduled. This is problematic
for AiiDA since tasks often take multiple hours even.

The `consumer_timeout` can only be changed on through the server config.
Here we disable it through the `advanced.config`.

Typing: Add overload signatures for `get_object_content`

Added for the `FolderData` and `NodeRepository` classes.

Typing: Add overload signatures for `open`

Added for the `FolderData` and `NodeRepository` classes. The signature
of the `SinglefileData` was actually incorrect as it defined:

    t.Iterator[t.BinaryIO | t.TextIO]

as the return type, but which should really be:

    t.Iterator[t.BinaryIO] | t.Iterator[t.TextIO]

The former will cause `mypy` to raise an error.

Docs: Add changes of v2.4.1 to `CHANGELOG.md`

Docs: Update citation suggestions (#6184)

ORM: Filter inconsequential warnings from `sqlalchemy` (#6192)

Recently, the code was updated to be compatible with `sqlalchmey~=2.0`
which caused a lot of warnings to be emitted. As of `sqlalchemy==2.0.19`
the `sqlalchemy.orm.unitofwork.UOWTransaction.register_object` method
emits a warning whenever an object is registered that is not part of the
session. See for details:

https://docs.sqlalchemy.org/en/20/changelog/changelog_20.html#change-53740fe9731bbe0f3bb71e3453df07d3

This can happen when the session is committed or flushed and an
object inside the session contains a reference to another object, for
example through a relationship, is not explicitly part of the session.
If that referenced object is not already stored and persisted, it might
get lost. On the other hand, if the object was already persisted before,
there is no risk.

This situation occurs a lot in AiiDA's code base. Prime example is when
a new process is created. Typically the input nodes are either already
stored, or stored first. As soon as they get stored, the session is
committed and the session is reset by expiring all objects. Now, the
input links are created from the input nodes to the process node, and at
the end the process node is stored to commit and persist it with the
links. It is at this point that Sqlalchemy realises that the input nodes
are not explicitly part of the session.

One direct solution would be to add the input nodes again to the session
before committing the process node and the links. However, this code is
part of the backend independent :mod:`aiida.orm` module and this is a
backend-specific problem. This is also just one example and there are
most likely other places in the code where the problem arises. Therefore,
as a workaround, a warning filter is put in place to silence this
particular warning. Note that `pytest` undoes all registered warning
filters, so it has to be added again in the `pytest` configuration in
the `pyproject.toml`.

Add the `aiida.common.log.capture_logging` utility

The `capture_logging` is a context manager that yields a stream in
memory to which all content written to the specified logger is
duplicated. This does not interfere with any existing logging handlers
whatsoever and so is non-destructive. It is useful to capture any output
that is logged into memory in order to be able to act on it.

CLI: Add the `verdi process repair` command

This command replaces `verdi devel rabbitmq tasks analyze`. This command
was added to the `verdi devel` namespace because it is working around a
problem and it was experimental. Since then, it has proved really
efficient and so should be made more directly available to users in case
of stuck processes.

The implementation is moved to `verdi process repair` and the original
command simply forwards to it, while emitting a message that it is
deprecated.

While the original command would not do anything by default and the
`--fix` flag had to be explicitly specified, this behavior is inverted
for `verdi process repair`. By default it will fix inconsistencies and
the `--dry-run` flag can be used to have to old behavior of just
detecting them.

CLI: Add repair hint to `verdi process play/pause/kill`

If a process' task was lost, the `verdi process play/pause/kill`
commands will report the error:

    Error: Process<****> is unreachable.

If at least one of the processes is reported to be unreachable, the
commands now log a message that suggests the user to run the command
`verdi process repair` to repair all processes whose tasks were lost.

Docs: Add changes of v2.4.2 to `CHANGELOG.md`

Add support for `NodeLinksManager` to YAML serializer (#6199)

The `Node.inputs` and `Node.outputs` properties return instances of the
`aiida.orm.utils.managers.NodeLinksManager` class. Support is added to
the `aiida.orm.utils.serialize` YAML serializers such that these
instances can now be stored in the context of `WorkChains` as these are
serialized to YAML for the checkpoints.

Process functions: Fix bug with variable arguments (#6201)

The process function implementation contained a bug where a function
that specified variable positional arguments followed by keyword
arguments would not be accepted. For example:

    def function(*args, arg_a, arg_b):
        pass

    function(*(1, 2), 3, 4)

is a perfectly valid function definition and call but it would not work
when decorated into a process function.

Part of the problem was that the class argument `_varargs` of the
dynamically constructed `FunctionProcess` was used for the name of
variable positional as well as keyword arguments. If both were defined,
the former would be overridden by the latter. This is now split in
`_var_positional` and `_var_keyword` respectively.

The conversion of the original positional and keyword arguments passed
to the function into the process input dictionary is simplified. As well
as for the reverse process where the process inputs are converted back
in to positional and keyword arguments before passing them to the
wrapped function.

ORM: Implement the `Dict.get` method (#6200)

This makes the behavior of `Dict` identical to that of a plain `dict`
with respect to this method. The `Dict` class inherited the `get` method
from the `Entity` base class, but that has a completely different
purpose that is not of interest for users of the `Dict`…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants