Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add verdi profile setup #6023

Merged
merged 3 commits into from
Nov 9, 2023
Merged

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented May 16, 2023

The verdi profile setup command is based of the DynamicEntryPointGroup, which is already used for verdi code create, to dynamically create a subcommand for each registered storage entry point. It automatically generates CLI options to allow users to specify configuration parameters necessary to initialize the storage.

Edit: Removing the commit that adds the SqliteDosStorage for now, as that is not required for this PR and it needs more work.

The SqliteDosStorage is a variant of the default PsqlDosStorageBackend swapping the Postgres database for an Sqlite one. This gets rid of the server requirement as sqlite can run of a file on disk, simplifying the setup significantly. This makes it a perfect storage for use in demos, tutorials and quick temporary work.

There was already the SqliteTempStorage, but that is a fully in memory storage, which is not persisted across Python interpreters, so it has limited usability. The SqliteZipStorage is read-only and so also won't be useful for work requiring storing of new data.

The verdi profile setup command is added to make it easy to create a new profile. With a single command, a fully functioning profile can be setup ready for use:

(aiida_310) sph@invader:~/code/aiida/env/dev/aiida-core$ verdi profile setup core.sqlite_dos -n --profile sqlite-dos
Report: Initialising the storage backend.
Report: Storage initialisation completed.
Success: Created new profile `sqlite-dos`.
(aiida_310) sph@invader:~/code/aiida/env/dev/aiida-core$ verdi profile show sqlite-dos
Report: Profile: sqlite-dos
PROFILE_UUID: 949cca3793fb45fe85cd86d0f43629d0
default_user_email: mail@sphuber.net
options: {}
process_control:
  backend: rabbitmq
  config:
    broker_host: 127.0.0.1
    broker_password: guest
    broker_port: 5672
    broker_protocol: amqp
    broker_username: guest
    broker_virtual_host: ''
storage:
  backend: core.sqlite_dos
  config:
    filepath: /tmp/tmpnr_wfwd8

(aiida_310) sph@invader:~/code/aiida/env/dev/aiida-core$ verdi -p sqlite-dos shell
Python 3.10.10 (main, Feb  8 2023, 14:50:01) [GCC 9.4.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.34.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: node = Int(1).store()

In [2]: node.pk
Out[2]: 1

In [4]:                                                                                                                                                                                        
Do you really want to exit ([y]/n)? 
(aiida_310) sph@invader:~/code/aiida/env/dev/aiida-core$ verdi -p sqlite-dos shell
Python 3.10.10 (main, Feb  8 2023, 14:50:01) [GCC 9.4.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.34.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: load_node(1).value
Out[1]: 1

@sphuber sphuber requested review from unkcpz and mbercx May 16, 2023 14:26
@sphuber sphuber force-pushed the feature/verdi-profile-setup branch from 0f4fe2b to 3f8392d Compare May 17, 2023 08:44
self.connection.commit() # pylint: disable=no-member


class SqliteDosStorage(PsqlDosBackend):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sceptical that here and in SqliteDosMigrator you can simply inherit from the PostgreSQL implementation and it will all just work, particularly migration, where you have differences in data types (like JSON and not JSONB).

You can also see methods like delete_nodes_and_connections which specifically uses psql_dos ORM models

Obviously testing would be required

@superstar54
Copy link
Member

I would like to say that this PR is a milestone for improving usability. Looking forward to it.

@chrisjsewell
Copy link
Member

I would like to say that this PR is a milestone for improving usability. Looking forward to it.

Its important to note, and document, though that this should probably
never be used in "production".
SQLite is not made to handle many concurrent writes, and you may well run into database locking issues

@sphuber
Copy link
Contributor Author

sphuber commented Jun 20, 2023

I agree with @chrisjsewell . As he rightly pointed out, there are a lot of shortcuts in the current implementation that have to be checked and it is not sure it supports all use-cases of AiiDA yet. But I think this will never be a real solution for production. Nevertheless, it may be fine for lots of use cases that are low-throughput and could do with a single daemon worker. For those cases it may work and be sufficient. But this needs to be tested and developed a lot more, for which I just haven't had the time yet.

@sphuber sphuber force-pushed the feature/verdi-profile-setup branch 7 times, most recently from 7f34e08 to 10b977a Compare July 8, 2023 03:37
@sphuber sphuber changed the title Add the SqliteDosStorage backend and verdi profile setup Add verdi profile setup Jul 8, 2023
@chrisjsewell
Copy link
Member

I see you are adding pydantic here now. I'm not necessarily against that, it's a nice package, but I think though we would need to put in some thought as to if we are happy to introduce a new dependency and (a) the extra maintenance that can require, plus (b) the potential for incompatibilities with plugins / users code.

This is a good example here of dependency implications, since I see you are pinning to pydantic v1, yet I know v2 has just been released (rewritten in rust, my new favourite language). So perhaps you need to update the version here, but then that will make it currently incompatible with aiida-restapi.

Also, if we are introducing pydantic, then there is obviously a lot more places that it could potentially be used, e.g. in-place of jsonschema validation and possibly places where we have started to use dataclasses (which in-turn are replacing attribute dicts).
We should ideally have a consistent approach to these kind of validated data structures

@sphuber sphuber force-pushed the feature/verdi-profile-setup branch from 10b977a to 1cba486 Compare July 9, 2023 17:48
@sphuber
Copy link
Contributor Author

sphuber commented Jul 9, 2023

I see you are adding pydantic here now. I'm not necessarily against that, it's a nice package, but I think though we would need to put in some thought as to if we are happy to introduce a new dependency and (a) the extra maintenance that can require, plus (b) the potential for incompatibilities with plugins / users code.

Sure, definitely think we should discuss it. The main reason I chose it is because I felt the custom solution of the _get_cli_options didn't really have to be a custom solution and could be way better. Especially for the storage backends, a solution based on something existing, like pydantic, comes with a lot of advantages. There is automatic genertion of a JSON schema, automatic validation etc. So I think it makes sense here to use a standard as opposed to our own ad-hoc solution. I also added it here to experiment with it and to see if we can use a similar approach to add this to Data subclasses to declare their data schema, which can then be leveraged by third-party tools to inspect this schema, such as the automatic serialization/deserialization discussed last week in the context of Louis declarative workflows.

This is a good example here of dependency implications, since I see you are pinning to pydantic v1, yet I know v2 has just been released (rewritten in rust, my new favourite language). So perhaps you need to update the version here, but then that will make it currently incompatible with aiida-restapi.

I am aware of v2 and originally actually wrote it against that version, but I noticed that downstream dependencies break. Not just the aiida-restapi but also the pymatgen dependencies through I believe emmet. Since v2 is only just out and I don't really need its features for this, we can just use v1 for now and wait for v2 to become more adopted in the ecosystem before migrating there ourselves as well.

Also, if we are introducing pydantic, then there is obviously a lot more places that it could potentially be used, e.g. in-place of jsonschema validation and possibly places where we have started to use dataclasses (which in-turn are replacing attribute dicts). We should ideally have a consistent approach to these kind of validated data structures

Agree. One big example that I would want to change is the config options which are also using a custom solution, which would be way better off using something like pydantic. But the change has to start somewhere of course, so if we agree that pydantic is a good replacement, we can start here. Happy to present this proposal next AiiDA meeting so the others can join in the discussion.

@sphuber sphuber force-pushed the feature/verdi-profile-setup branch 5 times, most recently from d12ccef to d4afaa2 Compare July 9, 2023 21:12
@mbercx
Copy link
Member

mbercx commented Jul 10, 2023

Quick note re the pydantic version: Both aiida-quantumespresso and aiida-submission-controller use pydantic v1 at the moment.

I was playing around with the new verdi profile setup command and a few questions came up:

  1. Are we planning to remove the verdi setup command in favor of this one?
  2. I noticed that the process_control options are no longer there for the verdi profile setup core.psql_dos command, is this intentional?

@chrisjsewell
Copy link
Member

I believe emmet

Indeed emmet-core is the only depedency in aiida-core[atomic_tools,notebook] pulling in pydantic, and I imagine they will upgrade sooner rather than later: materialsproject/emmet#768 (comment)

I would suggest it might be ideal to wait a little, to see if pydantic v2 can be used straight away. Otherwise, it just ends up on the backlog of outdated dependencies, like sqlalchemy v1.4 -> v2 and aiormq v6 -> v9

One big example that I would want to change is the config options which are also using a custom solution

Well this currently uses jsonschema, so it won't be too much of a change, since they are of course both solutions to the same problem of data validation.

I suggest you make a separate PR, introducing pydantic by replacing all uses of jsonschema, and thus this being a "one-in, one-out" dependency change.

@sphuber
Copy link
Contributor Author

sphuber commented Jul 10, 2023

Are we planning to remove the verdi setup command in favor of this one?

I think verdi setup could indeed be deprecated at some point since this would be a complete replacement, but now for all storage backends and not just psql_dos.

I noticed that the process_control options are no longer there for the verdi profile setup core.psql_dos command, is this intentional?

Good point, this is still missing, but they can be easily added, just like the user options are already there. I just hadn't added the broker options yet. Will add them soon.

I am just struggling with the tests mysteriously failing only on Python 3.9 and only when I import pydantic. Somehow, in this configuration, it causes problems in the daemon stopping fixture. I have no idea what the link is or how to further debug it...

@sphuber
Copy link
Contributor Author

sphuber commented Jul 10, 2023

I suggest you make a separate PR, introducing pydantic by replacing all uses of jsonschema, and thus this being a "one-in, one-out" dependency change.

Fair enough. Will have a stab at this later

@Andrew-S-Rosen
Copy link

Andrew-S-Rosen commented Jul 10, 2023

Indeed emmet-core is the only depedency in aiida-core[atomic_tools,notebook] pulling in pydantic, and I imagine they will upgrade sooner rather than later: materialsproject/emmet#768 (comment)

Indeed we will! It's on our to-do list (@janosh, @munrojm, @tschaume). If it's not done "soon", ping me and I can take a stab at it although I'm not the primary maintainer.

@sphuber sphuber force-pushed the feature/verdi-profile-setup branch from 129b405 to 99da950 Compare July 13, 2023 13:02
@sphuber sphuber force-pushed the feature/verdi-profile-setup branch 2 times, most recently from 863cdf7 to e34613f Compare September 8, 2023 12:55
@sphuber sphuber force-pushed the feature/verdi-profile-setup branch 3 times, most recently from 8eeee50 to aa3904f Compare October 12, 2023 14:06
@sphuber sphuber force-pushed the feature/verdi-profile-setup branch 2 times, most recently from 9657ee4 to 40be8e4 Compare October 25, 2023 16:12
@sphuber
Copy link
Contributor Author

sphuber commented Oct 25, 2023

With #6117 now merged, this PR is unblocked and ready to go. Any takers for a review? @unkcpz @mbercx @edan-bainglass @superstar54

This method takes a name and storage backend class, along with a
dictionary of configuration parameters, and creates a profile for it,
initializing the storage backend. If successful, the profile is added to
the config and it is saved to disk.

It is the `Config` class that defines the "structure" of a profile
configuration and so it should be this class to takes care of generating
this configuration. The storage configuration is the exception, since
there are multiple options for this, where the `StorageBackend` plugin
defines the structure of the required configuration dictionary.

This method will allow to remove all places in the code where a new
profile and its configuration dictionary is built up manually.
This command uses the `DynamicEntryPointCommandGroup` to allow creating
a new profile with any of the plugins registered in the `aiida.storage`
group. Each storage plugin will typically require a different set of
configuration parameters to initialize and connect to the storage. These
are generated dynamically from the specification returned by the method
`get_cli_options` defined on the `StorageBackend` base class. Each
plugin implements the abstract `_get_cli_options` method which is called
by the former and defines the configuration parameters of the plugin.

The values passed to the plugin specific options are used to instantiate
an instance of the storage class, registered under the chosen entry point
which is then initialised. If successful, the new profile is stored in
the `Config` and a default user is created and stored. After that, the
profile is ready for use.
The `DynamicEntryPointCommandGroup` depends on the entry point classes
to implement the `get_cli_options` method to return a dictionary with a
specification of the options to create. The schema of this dictionary
was a custom ad-hoc solution for this purpose. Here we switch to using
pydantic's `BaseModel` to define the `Config` class attribute which
defines the schema for the configuration necessary to construct an
instance of the entry points class.
@sphuber sphuber force-pushed the feature/verdi-profile-setup branch from 40be8e4 to d68edcb Compare November 9, 2023 12:51
@sphuber sphuber merged commit 1d8ea2a into aiidateam:main Nov 9, 2023
33 checks passed
@sphuber sphuber deleted the feature/verdi-profile-setup branch November 9, 2023 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants