Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Rework the installation section #6455

Merged
merged 1 commit into from
Jun 28, 2024

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Jun 5, 2024

The verdi setup and verdi quicksetup commands have been deprecated and replaced by verdi profile setup and verdi presto. The installation docs were heavily outdated and the flow was scattered.

The biggest change is that there now is a "quick install guide" that relies on verdi presto to provide an install route that is fool proof and will work on almost any system in a minimal amount of commands.

Then there is the "complete installation guide" that provides all the details necessary to fully customize an installation.

Copy link
Contributor

@GeigerJ2 GeigerJ2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work, thanks a lot, @sphuber! I added a few comments. Might go through it a second time in the following days with a fresh pair of 👀, but wanted to already save my comments up to this point.

docs/source/howto/daemon.rst Outdated Show resolved Hide resolved
docs/source/howto/interact.rst Show resolved Hide resolved
docs/source/installation/index.rst Outdated Show resolved Hide resolved
Alternatively, check out the :ref:`next steps guide <tutorial:next-steps>`.


Preinstalled environments
Copy link
Contributor

@GeigerJ2 GeigerJ2 Jun 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw this section now after adding my other comment in which I mention MyBinder. Should we add another card here with the MyBinder link? For this, should the MyBinder files be directly integrated into aiida-core (this is how pyiron does it), or under a separate repo under aiidateam? Currently, I have a minimal repo under my own account: https://github.com/GeigerJ2/aiida-mybinder/
The setup works (including RMQ installation), just need to add the updated examples, and optionally pin a custom docker image, though, I don't think that's really all that necessary, as when people run it, the compiled image gets stored by MyBinder, making start-up quick in most cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would for now not include it in aiida-core to keep dev and release cycle simpler by having it be decoupled. What else is needed for a deployment? I would like to test it to see how it works. We played around with it a long time ago and it was essentially not worth much. It would take a huge amount of time on startup and maintenance costs were not negligible. But happy to have a look now if things have changed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. On the page of my repo you can click on the "launch binder" icon, which opens it via the following link, pasted here for convenience:
https://mybinder.org/v2/gh/GeigerJ2/aiida-mybinder/HEAD

Start-up time depends on a pre-built Docker image already being on their servers. If not, it takes some time to build it (it also gets rebuilt whenever there are changes to the repo). Though, for me, the waiting time was always acceptable (max. ~2 mins). If an image is built, it is stored on their servers, so that start-up is quicker in the future, though not sure how long it is persisted there. After too much inactivity time, these images get deleted, and the next person to launch it still has to wait for the build. If we have (hopefully ;) enough traffic, there will always be an image available (or we could automate periodic access to the link, though not sure how much they would like that eheh).

In terms of maintenance cost, I see it really minimal. One doesn't need anything apart from environment.yml for the conda setup:
https://github.com/GeigerJ2/aiida-mybinder/blob/main/environment.yml

And (optionally) a postBuild file:
https://github.com/GeigerJ2/aiida-mybinder/blob/main/.binder/postBuild

Which I currently use to install aiida-core from main to be able to use verdi presto. Possibly, we could also do fully serviceless and without conda, installing v2.6 from pypi, then this would be even less (though, could be nice to have QE through conda and showing some very simple materials science examples). With this little effort, having a direct playground for people in the browser without any local installation seems well worth it to me.

Then, I'd not try to use our own Dockerfile, it's discouraged by MyBinder everywhere that people ask for it and I don't really see much benefit, but much higher maintenance cost. The only part where we need to invest some time is by providing some nice examples and making sure they work, to make it actually useful. But this is something we do anyway for the doc-tutorials, so not much overhead either.

Haven't transferred the repo to aiidateam and properly integrated the example notebooks, as I'd like to be sure that we keep it beforehand.

docs/source/installation/guide_complete/index.rst Outdated Show resolved Hide resolved
docs/source/installation/guide_complete/index.rst Outdated Show resolved Hide resolved
docs/source/installation/guide_quick/index.rst Outdated Show resolved Hide resolved

pip install aiida-core

.. tab-item:: conda
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't tell users anymore about the option of installing the services via conda? Should we add this here again or in the relevant sections about the services?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was another problem point that I don't know how best to handle. The old approach of the docs was a "vertical" narrative, where you chose an installation "method" e.g., system installation, or conda, or docker, and it went through it from start to finish.

I don't like this approach as you end up having to write out the same things in each vertical option, things like verdi profile setup, checking the result with verdi status etc. But worse, I feel that users don't get an idea of what exactly is required for AiiDA and why. It would make it impossible for them to then get away from any of the examples provided by us and come up with a custom installation.

In this rewrite, I have favored the "horizontal" approach, if you will, where I focus more on the abstract what an AiiDA installation includes, and then it leaves more open as to how those parts are actually performed by the user. For example, when we say that the core.psql_dos plugin requires PostgreSQL, it is only there that we then tell the user "ok, you now figure out how you want to use PostgreSQL. That may be through installation on your machine (which again has many options) or you can use a service on another machine".

I just feel that it is untenable for us to map out all the possible options that can occur for different operating systems. Before, we were pretty much forced to provide some of these instructions, because without PSQL, you simply couldn't run AiiDA. However now, it is truly optional, so I think it is fine that we no longer have to provide a gazillion different possibilities of how to install it. It made the original docs unreadable in my opinion.

If you can find a nice way to include the instructions for the conda way, into the new docs outline, that doesn't mess with the main flow and keeps it readable if people don't use conda, I would definitely consider keeping it. But there have historically been a lot of issues with people using the conda-based setup, mostly because the services aren't automatically started upon reboot and the services can clash with equivalents that are running system wide.

Copy link
Contributor

@GeigerJ2 GeigerJ2 Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sphuber, thanks for the detailed write-up! I definitely see the problems with the previous vertical approach. I just skimmed through those again, and the flow is certainly hard to follow, always having tabs for different options (e.g. OSs, pip+venv / conda, etc.). The new version reads much smoother.

I just feel that it is untenable for us to map out all the possible options that can occur for different operating systems. Before, we were pretty much forced to provide some of these instructions, because without PSQL, you simply couldn't run AiiDA. However now, it is truly optional, so I think it is fine that we no longer have to provide a gazillion different possibilities of how to install it. It made the original docs unreadable in my opinion.

Yeah, I definitely agree with that!

Why don't we just add an Info box in which we link the installation instructions on the RMQ and PSQL webpages, similar to the "Other" tab for the prerequisites, and also mention that the aiida-core.services conda package exists, so people can alternatively install everything via conda. This provides at least some minimal information for the users without cluttering the flow too much? The expected possible issues with conda could then be noted in the troubleshooting section, in which we don't have to care about a nice text flow, if we want to mention them at all?

image

docs/source/installation/docker.rst Show resolved Hide resolved
docs/source/installation/docker.rst Show resolved Hide resolved
@sphuber
Copy link
Contributor Author

sphuber commented Jun 11, 2024

Thanks for the review @GeigerJ2 . I have accepted most of your suggestion. I have given more extended responses to your questions/suggestions related to the current structure of the docs.

Comment on lines 46 to 53
#. Install `conda <https://conda.io/projects/conda/en/latest/user-guide/install/index.html>`_.

#. Create an environment and install ``aiida-core``:

.. code-block:: console

conda create -n aiida -c conda-forge aiida-core
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#. Install `conda <https://conda.io/projects/conda/en/latest/user-guide/install/index.html>`_.
#. Create an environment and install ``aiida-core``:
.. code-block:: console
conda create -n aiida -c conda-forge aiida-core
#. Install `conda <https://conda.io/projects/conda/en/latest/user-guide/install/index.html>`_ or `mamba <https://mamba.readthedocs.io/en/stable/installation/mamba-installation.html>`. You might experience that `conda` needs a lot of time to resolve the dependencies. We therefore recommend `mamba` for the installation.
#. Create an environment and install ``aiida-core``:
.. code-block:: console
conda create -n aiida -c conda-forge aiida-core
# or
mamba create -n aiida -c conda-forge aiida-core

I think we should at least add a sentence that the user should consider to use mamba, since resolving the dependencies with conda takes like 5 mins. That is not bearable for a new user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the point, but is that our problem really? Should we add another option to tell users to otherwise use micromamba because it takes less space? In my mind, these variations are not our responsability and listing them will make things more difficult to understand than is needed.

Copy link
Contributor

@agoscinski agoscinski Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a 5 minutes waiting time is not just an inconvenience but goes into the bug category. My behavior as a user was to stop it, because I thought there was a bug since I don't encounter so long resolving times often. If the package would take with conda 20GB and with micromamba just 1GB then it would be also worth to mention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the slowness of conda is very inconvenient and could be considered a bug, but that is a bug in conda surely and not AiiDA. Anyway, the mamba solver was integrated to conda in 2022 and has been the default solver since Oct 2023. So I would say that people that install conda now won't need mamba

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks for the information! I would say AiiDA has still the reputation to be not installable with conda (No one here at PSI mentioned to me that I just need to update conda when I mentioned this problem), so do you think it is still worth to make a note about updating one's conda version if one experience slowdowns?

Suggested change
#. Install `conda <https://conda.io/projects/conda/en/latest/user-guide/install/index.html>`_.
#. Create an environment and install ``aiida-core``:
.. code-block:: console
conda create -n aiida -c conda-forge aiida-core
#. Install `conda <https://conda.io/projects/conda/en/latest/user-guide/install/index.html>`_.
#. Create an environment and install ``aiida-core``:
.. code-block:: console
conda create -n aiida -c conda-forge aiida-core
.. note::
With `conda` version 23.10.0 the solver that introduced huge slowdowns has been updated.
If you experience a long resolving time during installation, please consider updating `conda`.

Copy link
Contributor

@agoscinski agoscinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, this is much nicer!

Next Steps is listed in index list on the Tutorials page but not in the Selection Navigation (see https://aiida--6455.org.readthedocs.build/projects/aiida-core/en/6455/tutorials/index.html). I am not sure maybe one just needs to add this to the index.rst to fix it?

.. toctree::
   :maxdepth: 1

   basic
   tutorial:next-steps <-- add this line
   Additional tutorials <https://aiida-tutorials.readthedocs.io/en/latest/>

Or maybe even simpler, just make Next steps a new file.

Comment on lines 198 to 199
The ``verdi profile setup`` command by default configures RabbitMQ as the broker and prompts for the connection parameters.
If RabbitMQ is not available or the profile should be configured without a broker for another reason, run:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With PR #6480 this logic has changed a bit, just make sure to update it in that PR

docs/source/installation/guide_quick/index.rst Outdated Show resolved Hide resolved
Alternatively, check out the :ref:`next steps guide <tutorial:next-steps>`.


Preinstalled environments
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. On the page of my repo you can click on the "launch binder" icon, which opens it via the following link, pasted here for convenience:
https://mybinder.org/v2/gh/GeigerJ2/aiida-mybinder/HEAD

Start-up time depends on a pre-built Docker image already being on their servers. If not, it takes some time to build it (it also gets rebuilt whenever there are changes to the repo). Though, for me, the waiting time was always acceptable (max. ~2 mins). If an image is built, it is stored on their servers, so that start-up is quicker in the future, though not sure how long it is persisted there. After too much inactivity time, these images get deleted, and the next person to launch it still has to wait for the build. If we have (hopefully ;) enough traffic, there will always be an image available (or we could automate periodic access to the link, though not sure how much they would like that eheh).

In terms of maintenance cost, I see it really minimal. One doesn't need anything apart from environment.yml for the conda setup:
https://github.com/GeigerJ2/aiida-mybinder/blob/main/environment.yml

And (optionally) a postBuild file:
https://github.com/GeigerJ2/aiida-mybinder/blob/main/.binder/postBuild

Which I currently use to install aiida-core from main to be able to use verdi presto. Possibly, we could also do fully serviceless and without conda, installing v2.6 from pypi, then this would be even less (though, could be nice to have QE through conda and showing some very simple materials science examples). With this little effort, having a direct playground for people in the browser without any local installation seems well worth it to me.

Then, I'd not try to use our own Dockerfile, it's discouraged by MyBinder everywhere that people ask for it and I don't really see much benefit, but much higher maintenance cost. The only part where we need to invest some time is by providing some nice examples and making sure they work, to make it actually useful. But this is something we do anyway for the doc-tutorials, so not much overhead either.

Haven't transferred the repo to aiidateam and properly integrated the example notebooks, as I'd like to be sure that we keep it beforehand.


pip install aiida-core

.. tab-item:: conda
Copy link
Contributor

@GeigerJ2 GeigerJ2 Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sphuber, thanks for the detailed write-up! I definitely see the problems with the previous vertical approach. I just skimmed through those again, and the flow is certainly hard to follow, always having tabs for different options (e.g. OSs, pip+venv / conda, etc.). The new version reads much smoother.

I just feel that it is untenable for us to map out all the possible options that can occur for different operating systems. Before, we were pretty much forced to provide some of these instructions, because without PSQL, you simply couldn't run AiiDA. However now, it is truly optional, so I think it is fine that we no longer have to provide a gazillion different possibilities of how to install it. It made the original docs unreadable in my opinion.

Yeah, I definitely agree with that!

Why don't we just add an Info box in which we link the installation instructions on the RMQ and PSQL webpages, similar to the "Other" tab for the prerequisites, and also mention that the aiida-core.services conda package exists, so people can alternatively install everything via conda. This provides at least some minimal information for the users without cluttering the flow too much? The expected possible issues with conda could then be noted in the troubleshooting section, in which we don't have to care about a nice text flow, if we want to mention them at all?

image


To get a working installation of AiiDA, the following steps are required:

#. :ref:`Install the Python Package <installation:guide-complete:python-package>`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, now that I thought about it again, I'm also on board with that. How about adding something like:

  1. [Optional] Use additional services in AiiDA
  2. [Optional] Setup AiiDA with additional services

Just to notify the user that these points will follow in the guide? Also phrasing it in that way emphasizes that these are not installation instructions for the services, which we don't provide, but rather how to set things up for AiiDA when they are already installed. We can still add an Info box with the links to the installation instructions on their webpages and the conda route, as I mentioned in my other comment.

In addition to this, there's a heading for RabbitMQ. We could also add a # Storage Plugins heading, and make core.sqlite_dos and core.psql_dos subsections. Right now, as they are formatted as code, they don't even seem like headings at first glance.

Regarding the instructions for RMQ, we could make the table with the parameters foldable to hide most of that away initially? In principle, I think it's fine to mention RMQ in the way it is done there, no need to go through its installation.


.. _installation:guide-complete:python-package:optional-requirements:

Optional requirements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why this is here in the section "Install Python Package", though, I propose we move that to the end, or fold it away. Most of these additional requirements (possibly apart from atomic_tools and notebook) are for devs and not relevant for the average user. We could mention those and make the full list foldable to reduce distractions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these additional requirements (possibly apart from atomic_tools and notebook) are for devs and not relevant for the average user.

Are you sure? To me, only 3 out of 8 are just for development (docs, pre-commit, tests).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, maybe I worded it poorly. While the other requirements like ssh_kerberos, rest, and tui might not be strictly for devs, I still think average users shouldn't be exposed to them that early (i.e. even before they are told how to create a profile).

ssh setup and Kerberos are probably just chores they have to do for authentication purposes (so not too interesting), then they will probably be unfamiliar with REST APIs (and likely not something they'll need any time soon), and the tui (while looking sexy af :3), doesn't really provide anything beyond the verdi CLI (or does it?). Hence, my proposal to move it further down (possibly before "Validate installation"?) to have the nice, straightforward flow of:
"Install Python Package" → "Create a profile"

That being said, I don't think it's extremely important, and I'm intentionally very picky with this PR, as I think the Installation page is super important. Cheers!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. I agree that it can be superfluous information for most cases, which is why I didn't include it at all in the quick install guide. However, here we are in the complete guide, so I think it is justified to provide this information. And although flow is important, I don't think it is that disruptive that we should move it elsewhere, where it would almost always be more disconnected from the most logical place. But as mentioned elsewhere, I would prefer to discuss this in person with everybody else, so we can hash all these suggestions out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree. Overall, I'm also fine with leaving it there. Let's put it on the agenda of the upcoming aiidateam meeting. Should be the last, more involved blocking PR that we should all be happy with 👍🏽


.. note::
The ``verdi presto`` command automatically checks if RabbitMQ is running on the localhost.
If it can successfully connect, it configures the profile with the message broker and therefore the daemon functionality will be available.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sphuber, very clear,
However, because running on a remote may be important for newcomers, I would quickly mention that, if you are interested in installing RabbitMq before'verdi presto', please check here --a link to RabbitMQ section of complete installation guide--.

Suggested change
If it can successfully connect, it configures the profile with the message broker and therefore the daemon functionality will be available.
If it can successfully connect, it configures the profile with the message broker and therefore the daemon functionality will be available.
As a side note you can follow :ref:`guide to install RabbitMQ <installation:guide-complete:create-profile:rabbitmq>` before creating a profile with `verdi presto`, if you'd like to have functional daemon which is responsible to submit jobs on a remote computer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sentence you added is incorrect though. You don't need to have a daemon to run on remote computers. So RabbitMQ is not required to run calculations on remote computers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it's not possible to submit jobs without daemon running, maybe I'm wrong ?
But I agree it's better to stress that you can still run on remote computer but cannot submit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it's not possible to submit jobs without daemon running

What is probably confusing you is the usage of "submit" in AiiDA lingo. We distinguish between running and submitting processes. The former performs the process in a local runner, whereas the latter has the process executed by a daemon runner. This is where RabbitMQ is necessary because that is how we can communicate with daemon runners. It has nothing to do with "submitting" to a scheduler on a remote computer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I see, thanks.
Then I would really change the last sentence in functionality paragraph. As it's a technical thing, it conveys the confusing message to beginners like me. Those who want easy installation might not be familiar with AiiDA lingo:

"As a result, the daemon cannot be started and processes cannot be submitted to it but can only be run locally."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any suggestions? Not sure how better to explain it. I guess I could add the literal consequence:

That is to say, it is only possible to use :meth:`aiida.engine.launch.run` to launch a process and :meth:`aiida.engine.launch.submit` is not available.

But that is even more technical and may confuse new users even more. I really think the simplest is just saying "the daemon functionality is not available" (which means starting the daemon and submitting processes to it). How else would you phrase/explain it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say it explicitly in real application. It's a important piece of information that submitting to remote computer is functional.

Maybe this ?:

++"As a result, the daemon functionality is not available. Note, although easy instillation is fully functional with submitting jobs to a remote computer, some features [like this and that] are limitted, if you plan to heavily load AiiDA with synchronous tasks we recommend install RabbitMQ :ref:guide to install RabbitMQ <installation:guide-complete:create-profile:rabbitmq> to make daemon functional."

--"As a result, the daemon cannot be started and processes cannot be submitted to it but can only be run locally."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fair enough. I guess I just don't really understand why you even thought that running on remote computers was not supported without the daemon. Nowhere was this particular feature of AiiDA mentioned. So it seems weird to me to single this one out as explicitly stating that it is still supported without a broker. Should we also mention that caching is still supported? Or workflows, or any other feature?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, honestly this was a question raised during Marvel review a few days ago. Some body asked if with the new verdi presto one can still submit jobs, we didn't know the answer, and although many of us, were present, nobody corrected it... that made me think maybe it is like that.. perhaps AiiDA lingo confusion also playing a part..

Apart from that, to be honest with you, I don't understand at all what RabbitMQ does exactly in AiiDA unfortunately (apart from being a message broker, etc....), I haven't explored that part of AiiDA yet.


.. note::
The ``verdi presto`` command can also create a profile using PostgreSQL instead of SQLite.
Specify the ``--use-postgres`` flag and provide the connection information to the PostgreSQL service using the various ``--postgres`` options.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would mention them just to be clear

Suggested change
Specify the ``--use-postgres`` flag and provide the connection information to the PostgreSQL service using the various ``--postgres`` options.
Specify the ``--use-postgres`` flag and provide the connection information to the PostgreSQL service using the various ``--postgres`` options: e.g. `--postgres-hostname`, `--postgres-port`, `--postgres-username` and `--postgres-password`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember @mbercx objections to having these options for verdi presto in the first place, which I also agree with. Ideally, we just have --use-postgres here, without all the other --postgres-... stuff. I think the default db name (which is also currently a missing option --postgres-name) and db user seem are sensible, and if people want to configure more things, they should use verdi profile setup. Don't want to be too much of a scope creep here, but we should probably agree on this, as, once we add options and release, we have to keep them due to backwards compatibility...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of things here:

I remember @mbercx objections to having these options for verdi presto in the first place, which I also agree with.

I think you are thinking of his objection to having the RabbitMQ connection configuration options in verdi profile setup. This is addressed in #6480 .

What he did remark with regards to the --postgres-* options is that he thought it wasn't consistent with the same options for verdi profile setup core.psql_dos where they are named --database-*. The problem is that we cannot change these for the core.psql_dos as they are taken from the pydantic model fields and are already part in the actual config.json. If we rename them, all configs need to be migrated as well. Alternative is to rename the options for verdi presto to use --database-* instead, but I find that less ideal because --postgres- is clearer.

I think the default db name (which is also currently a missing option --postgres-name)

This is intentional. If you want the functionality that automatically creates a PSQL database, then why provide an option that let's you specify the name. In what use case would this be necessary?

and if people want to configure more things, they should use verdi profile setup

But there they won't have the functionality that the database is created automatically, as you yourself would also want to keep. The options I added to verdi presto are necessary to keep the automatic database creation functionality useful, as the connection details to the PSQL server are not always postgres:postgres@localhost:5432 but the user, hostname and port can change. If the user cannot specify this, the command is useless.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are thinking of his objection to having the RabbitMQ connection configuration options in verdi profile setup. This is addressed in #6480.

No, I talked with him the other day, and it was about the --postgres-* options of verdi presto. Though, let's maybe wait until he responds, as I don't want to put (even more ^^) words in his mouth.

Though, I also think that the fact that these are always shown is suboptimal, as they are not relevant for anybody using SQLite and might just cause confusion for new users. And I understand it's not possible to generate them dynamically like it is done for verdi profile setup core.psql_dos, which would be the ideal solution I suppose.

What he did remark with regards to the --postgres-* options is that he thought it wasn't consistent with the same options for verdi profile setup core.psql_dos where they are named --database-*.

Yes, I also noted this in my other PR.

The problem is that we cannot change these for the core.psql_dos as they are taken from the pydantic model fields and are already part in the actual config.json. If we rename them, all configs need to be migrated as well.

That makes sense, though, thanks for the clarification!

This is intentional. If you want the functionality that automatically creates a PSQL database, then why provide an option that let's you specify the name. In what use case would this be necessary?

My reasoning is that, if one can specify all the other options for the db setup, then why not be able to also specify the name of the created db? I don't see the difference to the other options here, am I missing something?

But there they won't have the functionality that the database is created automatically, as you yourself would also want to keep.

Fair enough, I didn't consider that there is no option to generate it automatically for verdi profile setup core.psql_dos. That being said, why not add an option there which enables this? As we ask for all these options anyway, why not give the user the choice to let it be created by AiiDA on-the-fly?

The options I added to verdi presto are necessary to keep the automatic database creation functionality useful, as the connection details to the PSQL server are not always postgres:postgres@localhost:5432 but the user, hostname and port can change. If the user cannot specify this, the command is useless.

But is the command really "not useful" without these options? I guess for 99% of the intended users of the command, new users, they will have this setup, and will probably not even be aware of the existence of database user, hostname, and port options. Personally, I never found myself in a situation where I had to change these settings (though, my scope might be too narrow here). Assuming we have a flag for automatic database creation for verdi profile setup core.psql_dos as I propose above, removing the options from verdi presto and just running with the defaults seems sensible.

Copy link
Contributor Author

@sphuber sphuber Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning is that, if one can specify all the other options for the db setup, then why not be able to also specify the name of the created db? I don't see the difference to the other options here, am I missing something?

The whole point of this functionality in verdi presto is that it can automatically create the required database (and user) in a PostgreSQL server. Typically, the user doesn't care about the details and so we just generate one. However, the user does need to be able to specify where this server is. Although in many cases it may be running on the localhost on port 5432, that is not always the case. The same goes for the username and password with which to connect to PSQL. Although in some cases the defaults may work, or by resorting to using sudo, in some cases the user needs to specify a specific user account which has the rights to create a new database (and optionally a user).

Fair enough, I didn't consider that there is no option to generate it automatically for verdi profile setup core.psql_dos. That being said, why not add an option there which enables this? As we ask for all these options anyway, why not give the user the choice to let it be created by AiiDA on-the-fly?

Because the commands are generated on-the-fly for each storage plugin and although the storage plugins can customize which options are added, they cannot provide custom logic as would be required for the auto-resource creation for PSQL.

But is the command really "not useful" without these options? I guess for 99% of the intended users of the command, new users, they will have this setup, and will probably not even be aware of the existence of database user, hostname, and port options. Personally, I never found myself in a situation where I had to change these settings (though, my scope might be too narrow here). Assuming we have a flag for automatic database creation for verdi profile setup core.psql_dos as I propose above, removing the options from verdi presto and just running with the defaults seems sensible.

I guess this is the crux of the disagreement/misunderstanding: if we were able to add the auto-generation functionality to verdi profile core.psql_dos then yes I would agree that it doesn't make sense to add it to verdi presto. However, this is currently not the case, and that is the reason I added it to verdi presto (which essentially replaces verdi quicksetup).

Copy link
Contributor

@GeigerJ2 GeigerJ2 Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is the crux of the disagreement/misunderstanding: if we were able to add the auto-generation functionality to verdi profile core.psql_dos then yes I would agree that it doesn't make sense to add it to verdi presto.

Yeah, I agree that's the issue... if there's no other way to have the automatic creation, then it should be at least under verdi presto. Maybe @mbercx has some other idea here?

This is intentional. If you want the functionality that automatically creates a PSQL database, then why provide an option that let's you specify the name. In what use case would this be necessary?

The use case would be that maybe I'm too lazy to generate a PSQL database manually beforehand (e.g. because I don't remember the sequence of commands to do that), so I just let verdi presto do it for me, while I still have a specific db name in mind that I would want it to have? I understand that the other options are necessary to be able to connect to the PSQL-db, while the db name is not essential. Though, I don't see a problem with also allowing to customize that, as it seems unnecessarily restrictive otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while I still have a specific db name in mind that I would want it to have?

Why though? The idea from the beginning was to keep verdi presto as automatic and simple as possible, only adding what was necessary. Initially that meant there was no automatic database creation. But you and other users still used that functionality from verdi quicksetup, which I agree makes sense. So I added that functionality, but only the bare minimum. I really think this is important. Because why when we allow specifying the database name, would we also not allow specifying the database username and password? This is unnecessary complexity that detracts from verdi presto.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my initial gut reaction towards seeing the --postgres-X options was more that I wanted to keep verdi presto "clean" and "simple", without adding too many options, in line with @sphuber's last comment above. I also was wondering what we should do if we add another storage backend in the future that might need options.

@sphuber I think you saw this reaction live during our first configure-rabbitmq meeting. But I do have a lot of objections about many things, so I understand how they can be mixed up. ^^

Specifying the database name

The main reason I started using verdi quicksetup again was because I could specify the database name, which allowed me to easily identify the database when I wanted to do anything directly in PostgreSQL (e.g. for backups, hacking codes/computers, checking DB size, ...). So one use case for being able to specify the database name is for more advanced users that like the convenience of verdi presto but still want to have easily identifiable database names.

Another use case was users that can't create databases themselves, but have an admin who creates those for them (there were 2 users in ETH who were faced with this problem). Interestingly, verdi quicksetup also helped here, since if you specify an existing database, it simply warns that the database exists but still uses it.

Thinking about it some more though, the verdi profile setup core.psql_dos command using non-interactive mode only requires:

  • --profile
  • --database-username
  • --database-name
  • --database-password
  • --repository-uri

The first four are fine to ask, and the username and password could perhaps have stored defaults (similar to user email etc)? The --repository-uri could also have the same default as verdi presto, and should probably be easier to specify (now you need to prepend it with file://, which is not very user-friendly).

So I think it's fine to ask users that want to specify their database to run createdb and use verdi profile setup core.psql_dos, but we can still work towards making this smoother.

The --postgres-X options

I'm still leaning towards keeping only --use-postgres here, to keep the verdi presto API as clean as possible. One very nice aspect of verdi presto is the fact that it tries to hide complexity from the user if it can. For RabbitMQ this works very well, if the connection fails no broker is used and now it's possible for the user to configure it later. For PostgreSQL this of course isn't possible, since you need to create the database when creating the profile. So I see why @sphuber feels it's necessary to add the minimum options to verdi presto.

But is it really necessary? If --use-postgres is set and the database creation fails, the user is prompted for the required information:

❯ verdi presto --use-postgres
Unable to autodetect postgres setup.
Please provide PostgreSQL connection info:
postgres host [localhost]: 
postgres port [5432]: 
postgres super user [postgres]: 
database [template1]: 
postgres password of {dsn_new["user"]} []:

Although here it does ask for the database name and the password of user that will be created, but not the superuser password?

Conclusion

I would keep the verdi presto API as clean as possible, but:

  • Improve the interactive experience when the defaults fail to automatically create the database.
  • Store the defaults as provided by the users, so future verdi presto commands work smoothly.
  • Allow the user to configure the defaults with verdi config (I'm not sure where these are stored? I don't see them in verdi config list).

However, I'm still not 100% convinced of this approach either. My main concern with keeping the options is that once we release, we can't remove them anymore. And I think we all agree we want to keep verdi presto as simple as possible.

@khsrali
Copy link
Contributor

khsrali commented Jun 20, 2024

Just one more quick comment:
On the bottom of page "Complete installation guide", the Next button navigates to Docker page,
I understand this is done automatically by sphinx, but cannot we direct it to Tutorials instead? to me this seems more relevant.

@@ -0,0 +1,83 @@
.. _installation:guide-quick:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: is there a reason why these are in separate folders instead of just a single .rst file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was because I was anticipating that they would have several files, especially for the complete guide. Files for RabbitMQ and PSQL for example. If we end up not using it, I will convert them to single files


.. note::
The ``verdi presto`` command can also create a profile using PostgreSQL instead of SQLite.
Specify the ``--use-postgres`` flag and provide the connection information to the PostgreSQL service using the various ``--postgres`` options.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my initial gut reaction towards seeing the --postgres-X options was more that I wanted to keep verdi presto "clean" and "simple", without adding too many options, in line with @sphuber's last comment above. I also was wondering what we should do if we add another storage backend in the future that might need options.

@sphuber I think you saw this reaction live during our first configure-rabbitmq meeting. But I do have a lot of objections about many things, so I understand how they can be mixed up. ^^

Specifying the database name

The main reason I started using verdi quicksetup again was because I could specify the database name, which allowed me to easily identify the database when I wanted to do anything directly in PostgreSQL (e.g. for backups, hacking codes/computers, checking DB size, ...). So one use case for being able to specify the database name is for more advanced users that like the convenience of verdi presto but still want to have easily identifiable database names.

Another use case was users that can't create databases themselves, but have an admin who creates those for them (there were 2 users in ETH who were faced with this problem). Interestingly, verdi quicksetup also helped here, since if you specify an existing database, it simply warns that the database exists but still uses it.

Thinking about it some more though, the verdi profile setup core.psql_dos command using non-interactive mode only requires:

  • --profile
  • --database-username
  • --database-name
  • --database-password
  • --repository-uri

The first four are fine to ask, and the username and password could perhaps have stored defaults (similar to user email etc)? The --repository-uri could also have the same default as verdi presto, and should probably be easier to specify (now you need to prepend it with file://, which is not very user-friendly).

So I think it's fine to ask users that want to specify their database to run createdb and use verdi profile setup core.psql_dos, but we can still work towards making this smoother.

The --postgres-X options

I'm still leaning towards keeping only --use-postgres here, to keep the verdi presto API as clean as possible. One very nice aspect of verdi presto is the fact that it tries to hide complexity from the user if it can. For RabbitMQ this works very well, if the connection fails no broker is used and now it's possible for the user to configure it later. For PostgreSQL this of course isn't possible, since you need to create the database when creating the profile. So I see why @sphuber feels it's necessary to add the minimum options to verdi presto.

But is it really necessary? If --use-postgres is set and the database creation fails, the user is prompted for the required information:

❯ verdi presto --use-postgres
Unable to autodetect postgres setup.
Please provide PostgreSQL connection info:
postgres host [localhost]: 
postgres port [5432]: 
postgres super user [postgres]: 
database [template1]: 
postgres password of {dsn_new["user"]} []:

Although here it does ask for the database name and the password of user that will be created, but not the superuser password?

Conclusion

I would keep the verdi presto API as clean as possible, but:

  • Improve the interactive experience when the defaults fail to automatically create the database.
  • Store the defaults as provided by the users, so future verdi presto commands work smoothly.
  • Allow the user to configure the defaults with verdi config (I'm not sure where these are stored? I don't see them in verdi config list).

However, I'm still not 100% convinced of this approach either. My main concern with keeping the options is that once we release, we can't remove them anymore. And I think we all agree we want to keep verdi presto as simple as possible.

@sphuber
Copy link
Contributor Author

sphuber commented Jun 24, 2024

I think we may need a meeting to discuss in person, because going back-and-forth on github doesn't seem very efficient.

I think my initial gut reaction towards seeing the --postgres-X options was more that I wanted to keep verdi presto "clean" and "simple", without adding too many options

I fully agree. And the only options that I added are those that are necessary.

I also was wondering what we should do if we add another storage backend in the future that might need options.

We shouldn't. In my opinion, we are adding this functionality just for the core.psql_dos for historical reasons to keep the functionality that was originally in verdi quicksetup. I don't think that we should cater to needs of other backends, for example, some of the plugins of aiida-s3 that connect to an object store. I don't think we should have to implement custom logic that makes it easier or automates the creation of the required resources, such as the containers/buckets. That is up to the user.

The main reason I started using verdi quicksetup again was because I could specify the database name, which allowed me to easily identify the database when I wanted to do anything directly in PostgreSQL (e.g. for backups, hacking codes/computers, checking DB size, ...). So one use case for being able to specify the database name is for more advanced users that like the convenience of verdi presto but still want to have easily identifiable database names.

This seems to be contradictory. On the one hand you want to keep verdi presto simple, but then you want to start adding options that just happen to be useful to your particular use case. Can't have your cake and eat it too my man :D

The main reason I started using verdi quicksetup again was because I could specify the database name, which allowed me to easily identify the database when I wanted to do anything directly in PostgreSQL (e.g. for backups, hacking codes/computers, checking DB size, ...)

Just run verdi profile show and the database name is there. I don't think this "hurdle" justifies going against the principle of keeping verdi presto as simple/lean as possible.

Another use case was users that can't create databases themselves, but have an admin who creates those for them (there were 2 users in ETH who were faced with this problem). Interestingly, verdi quicksetup also helped here, since if you specify an existing database, it simply warns that the database exists but still uses it.

If the database already exists, why use verdi presto/quicksetup then? Just use verdi profile setup core.psql_dos. Why are we complicating things?

The --repository-uri could also have the same default as verdi presto

This should be easy to fix.

and should probably be easier to specify (now you need to prepend it with file://, which is not very user-friendly).

The file:// protocol prefix is historical and, I agree, unfortunate. We could change it by removing it, but it would require a config migration. Might still be worth doing because it has been annoying me for years. It was added I believe in the very beginning to allow in the future changing protocols for the file repository, but this is now properly addressed through storage plugins, I would say.

But is it really necessary? If --use-postgres is set and the database creation fails, the user is prompted for the required information:

I don't think relying on prompts is good design. At least it should not be required. There should always be a non-interactive option. I don't see the problem though. The options are exactly that, optional. They only show up if the user looks at the help text. How is that "confusing"? I could see the point with the old setup commands where all options would always be prompted for, even if users often would just want to use the results. But here, they are not even prompted for it. They just show up in the help text. I don't see a problem with that if it means that it provides all necessary options for all potential variants of PSQL servers.

@mbercx
Copy link
Member

mbercx commented Jun 24, 2024

We shouldn't. In my opinion, we are adding this functionality just for the core.psql_dos for historical reasons to keep the functionality that was originally in verdi quicksetup

Makes sense, I agree. 👍

Can't have your cake and eat it too my man :D
If the database already exists, why use verdi presto/quicksetup then? Just use verdi profile setup core.psql_dos. Why are we complicating things?

Haha, completely fair. It's also why I argued that in the end making the usage of verdi profile setup core.psql_dos smoother was the way to go for these use cases. Perhaps I should have led with that. ^^

I don't think relying on prompts is good design. At least it should not be required. There should always be a non-interactive option. I don't see the problem though. The options are exactly that, optional. They only show up if the user looks at the help text. How is that "confusing"?

Perhaps, we are probably a bit too hung up on these extra options in our zeal of keeping verdi presto minimal, making a mountain out of a molehill. One non-interactive alternative would be to use verdi config to set the postgresql defaults for an AiiDA instance (which could even at some point be overridden globally), and then these are automatically picked up by verdi presto. But having to execute two commands is also not ideal (though that's also true in case the user has a non-standard RabbitMQ configuration). Another argument might be that for any non-interactive use case, verdi profile setup core.psql_dos could be used. But in the end, I'm starting to feel more and more obstinate as I'm writing this, so maybe it's best to acquiesce and move on. ^^

@sphuber sphuber force-pushed the fix/docs-installation branch 2 times, most recently from f262916 to dc2f84f Compare June 28, 2024 14:03
The `verdi setup` and `verdi quicksetup` commands have been deprecated
and replaced by `verdi profile setup` and `verdi presto`. The
installation docs were heavily outdated and the flow was scattered.

The biggest change is that there now is a "quick install guide" that
relies on `verdi presto` to provide an install route that is fool proof
and will work on almost any system in a minimal amount of commands.

Then there is the "complete installation guide" that provides all the
details necessary to fully customize an installation.
@sphuber sphuber force-pushed the fix/docs-installation branch from dc2f84f to cdf3f18 Compare June 28, 2024 14:22
@sphuber sphuber merged commit 0ee0a0c into aiidateam:main Jun 28, 2024
4 checks passed
@sphuber sphuber deleted the fix/docs-installation branch June 28, 2024 21:01
mikibonacci pushed a commit to mikibonacci/aiida-core that referenced this pull request Sep 3, 2024
The `verdi setup` and `verdi quicksetup` commands have been deprecated
and replaced by `verdi profile setup` and `verdi presto`. The
installation docs were heavily outdated and the flow was scattered.

The biggest change is that there now is a "quick install guide" that
relies on `verdi presto` to provide an install route that is fool proof
and will work on almost any system in a minimal amount of commands.

Then there is the "complete installation guide" that provides all the
details necessary to fully customize an installation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/critical-blocking must be resolved before next release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants