-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs: Rework the installation section #6455
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work, thanks a lot, @sphuber! I added a few comments. Might go through it a second time in the following days with a fresh pair of 👀, but wanted to already save my comments up to this point.
Alternatively, check out the :ref:`next steps guide <tutorial:next-steps>`. | ||
|
||
|
||
Preinstalled environments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just saw this section now after adding my other comment in which I mention MyBinder. Should we add another card here with the MyBinder link? For this, should the MyBinder files be directly integrated into aiida-core
(this is how pyiron
does it), or under a separate repo under aiidateam
? Currently, I have a minimal repo under my own account: https://github.com/GeigerJ2/aiida-mybinder/
The setup works (including RMQ installation), just need to add the updated examples, and optionally pin a custom docker image, though, I don't think that's really all that necessary, as when people run it, the compiled image gets stored by MyBinder, making start-up quick in most cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would for now not include it in aiida-core
to keep dev and release cycle simpler by having it be decoupled. What else is needed for a deployment? I would like to test it to see how it works. We played around with it a long time ago and it was essentially not worth much. It would take a huge amount of time on startup and maintenance costs were not negligible. But happy to have a look now if things have changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. On the page of my repo you can click on the "launch binder" icon, which opens it via the following link, pasted here for convenience:
https://mybinder.org/v2/gh/GeigerJ2/aiida-mybinder/HEAD
Start-up time depends on a pre-built Docker image already being on their servers. If not, it takes some time to build it (it also gets rebuilt whenever there are changes to the repo). Though, for me, the waiting time was always acceptable (max. ~2 mins). If an image is built, it is stored on their servers, so that start-up is quicker in the future, though not sure how long it is persisted there. After too much inactivity time, these images get deleted, and the next person to launch it still has to wait for the build. If we have (hopefully ;) enough traffic, there will always be an image available (or we could automate periodic access to the link, though not sure how much they would like that eheh).
In terms of maintenance cost, I see it really minimal. One doesn't need anything apart from environment.yml
for the conda setup:
https://github.com/GeigerJ2/aiida-mybinder/blob/main/environment.yml
And (optionally) a postBuild
file:
https://github.com/GeigerJ2/aiida-mybinder/blob/main/.binder/postBuild
Which I currently use to install aiida-core
from main
to be able to use verdi presto
. Possibly, we could also do fully serviceless and without conda
, installing v2.6 from pypi, then this would be even less (though, could be nice to have QE through conda
and showing some very simple materials science examples). With this little effort, having a direct playground for people in the browser without any local installation seems well worth it to me.
Then, I'd not try to use our own Dockerfile, it's discouraged by MyBinder everywhere that people ask for it and I don't really see much benefit, but much higher maintenance cost. The only part where we need to invest some time is by providing some nice examples and making sure they work, to make it actually useful. But this is something we do anyway for the doc-tutorials, so not much overhead either.
Haven't transferred the repo to aiidateam
and properly integrated the example notebooks, as I'd like to be sure that we keep it beforehand.
|
||
pip install aiida-core | ||
|
||
.. tab-item:: conda |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't tell users anymore about the option of installing the services via conda
? Should we add this here again or in the relevant sections about the services?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was another problem point that I don't know how best to handle. The old approach of the docs was a "vertical" narrative, where you chose an installation "method" e.g., system installation, or conda, or docker, and it went through it from start to finish.
I don't like this approach as you end up having to write out the same things in each vertical option, things like verdi profile setup
, checking the result with verdi status
etc. But worse, I feel that users don't get an idea of what exactly is required for AiiDA and why. It would make it impossible for them to then get away from any of the examples provided by us and come up with a custom installation.
In this rewrite, I have favored the "horizontal" approach, if you will, where I focus more on the abstract what an AiiDA installation includes, and then it leaves more open as to how those parts are actually performed by the user. For example, when we say that the core.psql_dos
plugin requires PostgreSQL, it is only there that we then tell the user "ok, you now figure out how you want to use PostgreSQL. That may be through installation on your machine (which again has many options) or you can use a service on another machine".
I just feel that it is untenable for us to map out all the possible options that can occur for different operating systems. Before, we were pretty much forced to provide some of these instructions, because without PSQL, you simply couldn't run AiiDA. However now, it is truly optional, so I think it is fine that we no longer have to provide a gazillion different possibilities of how to install it. It made the original docs unreadable in my opinion.
If you can find a nice way to include the instructions for the conda way, into the new docs outline, that doesn't mess with the main flow and keeps it readable if people don't use conda, I would definitely consider keeping it. But there have historically been a lot of issues with people using the conda-based setup, mostly because the services aren't automatically started upon reboot and the services can clash with equivalents that are running system wide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sphuber, thanks for the detailed write-up! I definitely see the problems with the previous vertical approach. I just skimmed through those again, and the flow is certainly hard to follow, always having tabs for different options (e.g. OSs, pip+venv / conda, etc.). The new version reads much smoother.
I just feel that it is untenable for us to map out all the possible options that can occur for different operating systems. Before, we were pretty much forced to provide some of these instructions, because without PSQL, you simply couldn't run AiiDA. However now, it is truly optional, so I think it is fine that we no longer have to provide a gazillion different possibilities of how to install it. It made the original docs unreadable in my opinion.
Yeah, I definitely agree with that!
Why don't we just add an Info
box in which we link the installation instructions on the RMQ and PSQL webpages, similar to the "Other" tab for the prerequisites, and also mention that the aiida-core.services
conda
package exists, so people can alternatively install everything via conda
. This provides at least some minimal information for the users without cluttering the flow too much? The expected possible issues with conda
could then be noted in the troubleshooting section, in which we don't have to care about a nice text flow, if we want to mention them at all?
Thanks for the review @GeigerJ2 . I have accepted most of your suggestion. I have given more extended responses to your questions/suggestions related to the current structure of the docs. |
#. Install `conda <https://conda.io/projects/conda/en/latest/user-guide/install/index.html>`_. | ||
|
||
#. Create an environment and install ``aiida-core``: | ||
|
||
.. code-block:: console | ||
|
||
conda create -n aiida -c conda-forge aiida-core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#. Install `conda <https://conda.io/projects/conda/en/latest/user-guide/install/index.html>`_. | |
#. Create an environment and install ``aiida-core``: | |
.. code-block:: console | |
conda create -n aiida -c conda-forge aiida-core | |
#. Install `conda <https://conda.io/projects/conda/en/latest/user-guide/install/index.html>`_ or `mamba <https://mamba.readthedocs.io/en/stable/installation/mamba-installation.html>`. You might experience that `conda` needs a lot of time to resolve the dependencies. We therefore recommend `mamba` for the installation. | |
#. Create an environment and install ``aiida-core``: | |
.. code-block:: console | |
conda create -n aiida -c conda-forge aiida-core | |
# or | |
mamba create -n aiida -c conda-forge aiida-core | |
I think we should at least add a sentence that the user should consider to use mamba, since resolving the dependencies with conda takes like 5 mins. That is not bearable for a new user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the point, but is that our problem really? Should we add another option to tell users to otherwise use micromamba
because it takes less space? In my mind, these variations are not our responsability and listing them will make things more difficult to understand than is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a 5 minutes waiting time is not just an inconvenience but goes into the bug category. My behavior as a user was to stop it, because I thought there was a bug since I don't encounter so long resolving times often. If the package would take with conda 20GB and with micromamba just 1GB then it would be also worth to mention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the slowness of conda is very inconvenient and could be considered a bug, but that is a bug in conda surely and not AiiDA. Anyway, the mamba solver was integrated to conda in 2022 and has been the default solver since Oct 2023. So I would say that people that install conda now won't need mamba
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks for the information! I would say AiiDA has still the reputation to be not installable with conda (No one here at PSI mentioned to me that I just need to update conda when I mentioned this problem), so do you think it is still worth to make a note about updating one's conda version if one experience slowdowns?
#. Install `conda <https://conda.io/projects/conda/en/latest/user-guide/install/index.html>`_. | |
#. Create an environment and install ``aiida-core``: | |
.. code-block:: console | |
conda create -n aiida -c conda-forge aiida-core | |
#. Install `conda <https://conda.io/projects/conda/en/latest/user-guide/install/index.html>`_. | |
#. Create an environment and install ``aiida-core``: | |
.. code-block:: console | |
conda create -n aiida -c conda-forge aiida-core | |
.. note:: | |
With `conda` version 23.10.0 the solver that introduced huge slowdowns has been updated. | |
If you experience a long resolving time during installation, please consider updating `conda`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, this is much nicer!
Next Steps
is listed in index list on the Tutorials
page but not in the Selection Navigation (see https://aiida--6455.org.readthedocs.build/projects/aiida-core/en/6455/tutorials/index.html). I am not sure maybe one just needs to add this to the index.rst to fix it?
.. toctree::
:maxdepth: 1
basic
tutorial:next-steps <-- add this line
Additional tutorials <https://aiida-tutorials.readthedocs.io/en/latest/>
Or maybe even simpler, just make Next steps
a new file.
The ``verdi profile setup`` command by default configures RabbitMQ as the broker and prompts for the connection parameters. | ||
If RabbitMQ is not available or the profile should be configured without a broker for another reason, run: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With PR #6480 this logic has changed a bit, just make sure to update it in that PR
Alternatively, check out the :ref:`next steps guide <tutorial:next-steps>`. | ||
|
||
|
||
Preinstalled environments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. On the page of my repo you can click on the "launch binder" icon, which opens it via the following link, pasted here for convenience:
https://mybinder.org/v2/gh/GeigerJ2/aiida-mybinder/HEAD
Start-up time depends on a pre-built Docker image already being on their servers. If not, it takes some time to build it (it also gets rebuilt whenever there are changes to the repo). Though, for me, the waiting time was always acceptable (max. ~2 mins). If an image is built, it is stored on their servers, so that start-up is quicker in the future, though not sure how long it is persisted there. After too much inactivity time, these images get deleted, and the next person to launch it still has to wait for the build. If we have (hopefully ;) enough traffic, there will always be an image available (or we could automate periodic access to the link, though not sure how much they would like that eheh).
In terms of maintenance cost, I see it really minimal. One doesn't need anything apart from environment.yml
for the conda setup:
https://github.com/GeigerJ2/aiida-mybinder/blob/main/environment.yml
And (optionally) a postBuild
file:
https://github.com/GeigerJ2/aiida-mybinder/blob/main/.binder/postBuild
Which I currently use to install aiida-core
from main
to be able to use verdi presto
. Possibly, we could also do fully serviceless and without conda
, installing v2.6 from pypi, then this would be even less (though, could be nice to have QE through conda
and showing some very simple materials science examples). With this little effort, having a direct playground for people in the browser without any local installation seems well worth it to me.
Then, I'd not try to use our own Dockerfile, it's discouraged by MyBinder everywhere that people ask for it and I don't really see much benefit, but much higher maintenance cost. The only part where we need to invest some time is by providing some nice examples and making sure they work, to make it actually useful. But this is something we do anyway for the doc-tutorials, so not much overhead either.
Haven't transferred the repo to aiidateam
and properly integrated the example notebooks, as I'd like to be sure that we keep it beforehand.
|
||
pip install aiida-core | ||
|
||
.. tab-item:: conda |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sphuber, thanks for the detailed write-up! I definitely see the problems with the previous vertical approach. I just skimmed through those again, and the flow is certainly hard to follow, always having tabs for different options (e.g. OSs, pip+venv / conda, etc.). The new version reads much smoother.
I just feel that it is untenable for us to map out all the possible options that can occur for different operating systems. Before, we were pretty much forced to provide some of these instructions, because without PSQL, you simply couldn't run AiiDA. However now, it is truly optional, so I think it is fine that we no longer have to provide a gazillion different possibilities of how to install it. It made the original docs unreadable in my opinion.
Yeah, I definitely agree with that!
Why don't we just add an Info
box in which we link the installation instructions on the RMQ and PSQL webpages, similar to the "Other" tab for the prerequisites, and also mention that the aiida-core.services
conda
package exists, so people can alternatively install everything via conda
. This provides at least some minimal information for the users without cluttering the flow too much? The expected possible issues with conda
could then be noted in the troubleshooting section, in which we don't have to care about a nice text flow, if we want to mention them at all?
|
||
To get a working installation of AiiDA, the following steps are required: | ||
|
||
#. :ref:`Install the Python Package <installation:guide-complete:python-package>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, now that I thought about it again, I'm also on board with that. How about adding something like:
- [Optional] Use additional services in AiiDA
- [Optional] Setup AiiDA with additional services
Just to notify the user that these points will follow in the guide? Also phrasing it in that way emphasizes that these are not installation instructions for the services, which we don't provide, but rather how to set things up for AiiDA when they are already installed. We can still add an Info box with the links to the installation instructions on their webpages and the conda
route, as I mentioned in my other comment.
In addition to this, there's a heading for RabbitMQ. We could also add a # Storage Plugins
heading, and make core.sqlite_dos
and core.psql_dos
subsections. Right now, as they are formatted as code, they don't even seem like headings at first glance.
Regarding the instructions for RMQ, we could make the table with the parameters foldable to hide most of that away initially? In principle, I think it's fine to mention RMQ in the way it is done there, no need to go through its installation.
|
||
.. _installation:guide-complete:python-package:optional-requirements: | ||
|
||
Optional requirements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand why this is here in the section "Install Python Package", though, I propose we move that to the end, or fold it away. Most of these additional requirements (possibly apart from atomic_tools
and notebook
) are for dev
s and not relevant for the average user. We could mention those and make the full list foldable to reduce distractions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of these additional requirements (possibly apart from atomic_tools and notebook) are for devs and not relevant for the average user.
Are you sure? To me, only 3 out of 8 are just for development (docs
, pre-commit
, tests
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, maybe I worded it poorly. While the other requirements like ssh_kerberos
, rest
, and tui
might not be strictly for dev
s, I still think average users shouldn't be exposed to them that early (i.e. even before they are told how to create a profile).
ssh
setup and Kerberos are probably just chores they have to do for authentication purposes (so not too interesting), then they will probably be unfamiliar with REST APIs (and likely not something they'll need any time soon), and the tui
(while looking sexy af :3), doesn't really provide anything beyond the verdi
CLI (or does it?). Hence, my proposal to move it further down (possibly before "Validate installation"?) to have the nice, straightforward flow of:
"Install Python Package" → "Create a profile"
That being said, I don't think it's extremely important, and I'm intentionally very picky with this PR, as I think the Installation page is super important. Cheers!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. I agree that it can be superfluous information for most cases, which is why I didn't include it at all in the quick install guide. However, here we are in the complete guide, so I think it is justified to provide this information. And although flow is important, I don't think it is that disruptive that we should move it elsewhere, where it would almost always be more disconnected from the most logical place. But as mentioned elsewhere, I would prefer to discuss this in person with everybody else, so we can hash all these suggestions out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree. Overall, I'm also fine with leaving it there. Let's put it on the agenda of the upcoming aiidateam meeting. Should be the last, more involved blocking PR that we should all be happy with 👍🏽
|
||
.. note:: | ||
The ``verdi presto`` command automatically checks if RabbitMQ is running on the localhost. | ||
If it can successfully connect, it configures the profile with the message broker and therefore the daemon functionality will be available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sphuber, very clear,
However, because running on a remote may be important for newcomers, I would quickly mention that, if you are interested in installing RabbitMq before'verdi presto', please check here --a link to RabbitMQ section of complete installation guide--.
If it can successfully connect, it configures the profile with the message broker and therefore the daemon functionality will be available. | |
If it can successfully connect, it configures the profile with the message broker and therefore the daemon functionality will be available. | |
As a side note you can follow :ref:`guide to install RabbitMQ <installation:guide-complete:create-profile:rabbitmq>` before creating a profile with `verdi presto`, if you'd like to have functional daemon which is responsible to submit jobs on a remote computer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sentence you added is incorrect though. You don't need to have a daemon to run on remote computers. So RabbitMQ is not required to run calculations on remote computers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it's not possible to submit jobs without daemon running, maybe I'm wrong ?
But I agree it's better to stress that you can still run on remote computer but cannot submit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it's not possible to submit jobs without daemon running
What is probably confusing you is the usage of "submit" in AiiDA lingo. We distinguish between running and submitting processes. The former performs the process in a local runner, whereas the latter has the process executed by a daemon runner. This is where RabbitMQ is necessary because that is how we can communicate with daemon runners. It has nothing to do with "submitting" to a scheduler on a remote computer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I see, thanks.
Then I would really change the last sentence in functionality paragraph. As it's a technical thing, it conveys the confusing message to beginners like me. Those who want easy installation might not be familiar with AiiDA lingo:
"As a result, the daemon cannot be started and processes cannot be submitted to it but can only be run locally."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any suggestions? Not sure how better to explain it. I guess I could add the literal consequence:
That is to say, it is only possible to use :meth:`aiida.engine.launch.run` to launch a process and :meth:`aiida.engine.launch.submit` is not available.
But that is even more technical and may confuse new users even more. I really think the simplest is just saying "the daemon functionality is not available" (which means starting the daemon and submitting processes to it). How else would you phrase/explain it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say it explicitly in real application. It's a important piece of information that submitting to remote computer is functional.
Maybe this ?:
++"As a result, the daemon functionality is not available. Note, although easy instillation is fully functional with submitting jobs to a remote computer, some features [like this and that] are limitted, if you plan to heavily load AiiDA with synchronous tasks we recommend install RabbitMQ :ref:guide to install RabbitMQ <installation:guide-complete:create-profile:rabbitmq>
to make daemon functional."
--"As a result, the daemon cannot be started and processes cannot be submitted to it but can only be run locally."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, fair enough. I guess I just don't really understand why you even thought that running on remote computers was not supported without the daemon. Nowhere was this particular feature of AiiDA mentioned. So it seems weird to me to single this one out as explicitly stating that it is still supported without a broker. Should we also mention that caching is still supported? Or workflows, or any other feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, honestly this was a question raised during Marvel review a few days ago. Some body asked if with the new verdi presto
one can still submit jobs, we didn't know the answer, and although many of us, were present, nobody corrected it... that made me think maybe it is like that.. perhaps AiiDA lingo confusion also playing a part..
Apart from that, to be honest with you, I don't understand at all what RabbitMQ does exactly in AiiDA unfortunately (apart from being a message broker, etc....), I haven't explored that part of AiiDA yet.
|
||
.. note:: | ||
The ``verdi presto`` command can also create a profile using PostgreSQL instead of SQLite. | ||
Specify the ``--use-postgres`` flag and provide the connection information to the PostgreSQL service using the various ``--postgres`` options. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would mention them just to be clear
Specify the ``--use-postgres`` flag and provide the connection information to the PostgreSQL service using the various ``--postgres`` options. | |
Specify the ``--use-postgres`` flag and provide the connection information to the PostgreSQL service using the various ``--postgres`` options: e.g. `--postgres-hostname`, `--postgres-port`, `--postgres-username` and `--postgres-password`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember @mbercx objections to having these options for verdi presto
in the first place, which I also agree with. Ideally, we just have --use-postgres
here, without all the other --postgres-...
stuff. I think the default db name (which is also currently a missing option --postgres-name
) and db user seem are sensible, and if people want to configure more things, they should use verdi profile setup
. Don't want to be too much of a scope creep here, but we should probably agree on this, as, once we add options and release, we have to keep them due to backwards compatibility...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of things here:
I remember @mbercx objections to having these options for verdi presto in the first place, which I also agree with.
I think you are thinking of his objection to having the RabbitMQ connection configuration options in verdi profile setup
. This is addressed in #6480 .
What he did remark with regards to the --postgres-*
options is that he thought it wasn't consistent with the same options for verdi profile setup core.psql_dos
where they are named --database-*
. The problem is that we cannot change these for the core.psql_dos
as they are taken from the pydantic model fields and are already part in the actual config.json
. If we rename them, all configs need to be migrated as well. Alternative is to rename the options for verdi presto
to use --database-*
instead, but I find that less ideal because --postgres-
is clearer.
I think the default db name (which is also currently a missing option --postgres-name)
This is intentional. If you want the functionality that automatically creates a PSQL database, then why provide an option that let's you specify the name. In what use case would this be necessary?
and if people want to configure more things, they should use verdi profile setup
But there they won't have the functionality that the database is created automatically, as you yourself would also want to keep. The options I added to verdi presto
are necessary to keep the automatic database creation functionality useful, as the connection details to the PSQL server are not always postgres:postgres@localhost:5432
but the user, hostname and port can change. If the user cannot specify this, the command is useless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are thinking of his objection to having the RabbitMQ connection configuration options in verdi profile setup. This is addressed in #6480.
No, I talked with him the other day, and it was about the --postgres-*
options of verdi presto
. Though, let's maybe wait until he responds, as I don't want to put (even more ^^) words in his mouth.
Though, I also think that the fact that these are always shown is suboptimal, as they are not relevant for anybody using SQLite and might just cause confusion for new users. And I understand it's not possible to generate them dynamically like it is done for verdi profile setup core.psql_dos
, which would be the ideal solution I suppose.
What he did remark with regards to the --postgres-* options is that he thought it wasn't consistent with the same options for verdi profile setup core.psql_dos where they are named --database-*.
Yes, I also noted this in my other PR.
The problem is that we cannot change these for the core.psql_dos as they are taken from the pydantic model fields and are already part in the actual config.json. If we rename them, all configs need to be migrated as well.
That makes sense, though, thanks for the clarification!
This is intentional. If you want the functionality that automatically creates a PSQL database, then why provide an option that let's you specify the name. In what use case would this be necessary?
My reasoning is that, if one can specify all the other options for the db setup, then why not be able to also specify the name of the created db? I don't see the difference to the other options here, am I missing something?
But there they won't have the functionality that the database is created automatically, as you yourself would also want to keep.
Fair enough, I didn't consider that there is no option to generate it automatically for verdi profile setup core.psql_dos
. That being said, why not add an option there which enables this? As we ask for all these options anyway, why not give the user the choice to let it be created by AiiDA on-the-fly?
The options I added to verdi presto are necessary to keep the automatic database creation functionality useful, as the connection details to the PSQL server are not always postgres:postgres@localhost:5432 but the user, hostname and port can change. If the user cannot specify this, the command is useless.
But is the command really "not useful" without these options? I guess for 99% of the intended users of the command, new users, they will have this setup, and will probably not even be aware of the existence of database user, hostname, and port options. Personally, I never found myself in a situation where I had to change these settings (though, my scope might be too narrow here). Assuming we have a flag for automatic database creation for verdi profile setup core.psql_dos
as I propose above, removing the options from verdi presto
and just running with the defaults seems sensible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reasoning is that, if one can specify all the other options for the db setup, then why not be able to also specify the name of the created db? I don't see the difference to the other options here, am I missing something?
The whole point of this functionality in verdi presto
is that it can automatically create the required database (and user) in a PostgreSQL server. Typically, the user doesn't care about the details and so we just generate one. However, the user does need to be able to specify where this server is. Although in many cases it may be running on the localhost on port 5432, that is not always the case. The same goes for the username and password with which to connect to PSQL. Although in some cases the defaults may work, or by resorting to using sudo, in some cases the user needs to specify a specific user account which has the rights to create a new database (and optionally a user).
Fair enough, I didn't consider that there is no option to generate it automatically for verdi profile setup core.psql_dos. That being said, why not add an option there which enables this? As we ask for all these options anyway, why not give the user the choice to let it be created by AiiDA on-the-fly?
Because the commands are generated on-the-fly for each storage plugin and although the storage plugins can customize which options are added, they cannot provide custom logic as would be required for the auto-resource creation for PSQL.
But is the command really "not useful" without these options? I guess for 99% of the intended users of the command, new users, they will have this setup, and will probably not even be aware of the existence of database user, hostname, and port options. Personally, I never found myself in a situation where I had to change these settings (though, my scope might be too narrow here). Assuming we have a flag for automatic database creation for verdi profile setup core.psql_dos as I propose above, removing the options from verdi presto and just running with the defaults seems sensible.
I guess this is the crux of the disagreement/misunderstanding: if we were able to add the auto-generation functionality to verdi profile core.psql_dos
then yes I would agree that it doesn't make sense to add it to verdi presto
. However, this is currently not the case, and that is the reason I added it to verdi presto
(which essentially replaces verdi quicksetup
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is the crux of the disagreement/misunderstanding: if we were able to add the auto-generation functionality to
verdi profile core.psql_dos
then yes I would agree that it doesn't make sense to add it toverdi presto
.
Yeah, I agree that's the issue... if there's no other way to have the automatic creation, then it should be at least under verdi presto
. Maybe @mbercx has some other idea here?
This is intentional. If you want the functionality that automatically creates a PSQL database, then why provide an option that let's you specify the name. In what use case would this be necessary?
The use case would be that maybe I'm too lazy to generate a PSQL database manually beforehand (e.g. because I don't remember the sequence of commands to do that), so I just let verdi presto
do it for me, while I still have a specific db name in mind that I would want it to have? I understand that the other options are necessary to be able to connect to the PSQL-db, while the db name is not essential. Though, I don't see a problem with also allowing to customize that, as it seems unnecessarily restrictive otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while I still have a specific db name in mind that I would want it to have?
Why though? The idea from the beginning was to keep verdi presto
as automatic and simple as possible, only adding what was necessary. Initially that meant there was no automatic database creation. But you and other users still used that functionality from verdi quicksetup
, which I agree makes sense. So I added that functionality, but only the bare minimum. I really think this is important. Because why when we allow specifying the database name, would we also not allow specifying the database username and password? This is unnecessary complexity that detracts from verdi presto
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my initial gut reaction towards seeing the --postgres-X
options was more that I wanted to keep verdi presto
"clean" and "simple", without adding too many options, in line with @sphuber's last comment above. I also was wondering what we should do if we add another storage backend in the future that might need options.
@sphuber I think you saw this reaction live during our first configure-rabbitmq
meeting. But I do have a lot of objections about many things, so I understand how they can be mixed up. ^^
Specifying the database name
The main reason I started using verdi quicksetup
again was because I could specify the database name, which allowed me to easily identify the database when I wanted to do anything directly in PostgreSQL (e.g. for backups, hacking codes/computers, checking DB size, ...). So one use case for being able to specify the database name is for more advanced users that like the convenience of verdi presto
but still want to have easily identifiable database names.
Another use case was users that can't create databases themselves, but have an admin who creates those for them (there were 2 users in ETH who were faced with this problem). Interestingly, verdi quicksetup
also helped here, since if you specify an existing database, it simply warns that the database exists but still uses it.
Thinking about it some more though, the verdi profile setup core.psql_dos
command using non-interactive mode only requires:
--profile
--database-username
--database-name
--database-password
--repository-uri
The first four are fine to ask, and the username and password could perhaps have stored defaults (similar to user email etc)? The --repository-uri
could also have the same default as verdi presto
, and should probably be easier to specify (now you need to prepend it with file://
, which is not very user-friendly).
So I think it's fine to ask users that want to specify their database to run createdb
and use verdi profile setup core.psql_dos
, but we can still work towards making this smoother.
The --postgres-X
options
I'm still leaning towards keeping only --use-postgres
here, to keep the verdi presto
API as clean as possible. One very nice aspect of verdi presto
is the fact that it tries to hide complexity from the user if it can. For RabbitMQ this works very well, if the connection fails no broker is used and now it's possible for the user to configure it later. For PostgreSQL this of course isn't possible, since you need to create the database when creating the profile. So I see why @sphuber feels it's necessary to add the minimum options to verdi presto
.
But is it really necessary? If --use-postgres
is set and the database creation fails, the user is prompted for the required information:
❯ verdi presto --use-postgres
Unable to autodetect postgres setup.
Please provide PostgreSQL connection info:
postgres host [localhost]:
postgres port [5432]:
postgres super user [postgres]:
database [template1]:
postgres password of {dsn_new["user"]} []:
Although here it does ask for the database name and the password of user that will be created, but not the superuser password?
Conclusion
I would keep the verdi presto
API as clean as possible, but:
- Improve the interactive experience when the defaults fail to automatically create the database.
- Store the defaults as provided by the users, so future
verdi presto
commands work smoothly. - Allow the user to configure the defaults with
verdi config
(I'm not sure where these are stored? I don't see them inverdi config list
).
However, I'm still not 100% convinced of this approach either. My main concern with keeping the options is that once we release, we can't remove them anymore. And I think we all agree we want to keep verdi presto
as simple as possible.
Just one more quick comment: |
@@ -0,0 +1,83 @@ | |||
.. _installation:guide-quick: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: is there a reason why these are in separate folders instead of just a single .rst
file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was because I was anticipating that they would have several files, especially for the complete guide. Files for RabbitMQ and PSQL for example. If we end up not using it, I will convert them to single files
|
||
.. note:: | ||
The ``verdi presto`` command can also create a profile using PostgreSQL instead of SQLite. | ||
Specify the ``--use-postgres`` flag and provide the connection information to the PostgreSQL service using the various ``--postgres`` options. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my initial gut reaction towards seeing the --postgres-X
options was more that I wanted to keep verdi presto
"clean" and "simple", without adding too many options, in line with @sphuber's last comment above. I also was wondering what we should do if we add another storage backend in the future that might need options.
@sphuber I think you saw this reaction live during our first configure-rabbitmq
meeting. But I do have a lot of objections about many things, so I understand how they can be mixed up. ^^
Specifying the database name
The main reason I started using verdi quicksetup
again was because I could specify the database name, which allowed me to easily identify the database when I wanted to do anything directly in PostgreSQL (e.g. for backups, hacking codes/computers, checking DB size, ...). So one use case for being able to specify the database name is for more advanced users that like the convenience of verdi presto
but still want to have easily identifiable database names.
Another use case was users that can't create databases themselves, but have an admin who creates those for them (there were 2 users in ETH who were faced with this problem). Interestingly, verdi quicksetup
also helped here, since if you specify an existing database, it simply warns that the database exists but still uses it.
Thinking about it some more though, the verdi profile setup core.psql_dos
command using non-interactive mode only requires:
--profile
--database-username
--database-name
--database-password
--repository-uri
The first four are fine to ask, and the username and password could perhaps have stored defaults (similar to user email etc)? The --repository-uri
could also have the same default as verdi presto
, and should probably be easier to specify (now you need to prepend it with file://
, which is not very user-friendly).
So I think it's fine to ask users that want to specify their database to run createdb
and use verdi profile setup core.psql_dos
, but we can still work towards making this smoother.
The --postgres-X
options
I'm still leaning towards keeping only --use-postgres
here, to keep the verdi presto
API as clean as possible. One very nice aspect of verdi presto
is the fact that it tries to hide complexity from the user if it can. For RabbitMQ this works very well, if the connection fails no broker is used and now it's possible for the user to configure it later. For PostgreSQL this of course isn't possible, since you need to create the database when creating the profile. So I see why @sphuber feels it's necessary to add the minimum options to verdi presto
.
But is it really necessary? If --use-postgres
is set and the database creation fails, the user is prompted for the required information:
❯ verdi presto --use-postgres
Unable to autodetect postgres setup.
Please provide PostgreSQL connection info:
postgres host [localhost]:
postgres port [5432]:
postgres super user [postgres]:
database [template1]:
postgres password of {dsn_new["user"]} []:
Although here it does ask for the database name and the password of user that will be created, but not the superuser password?
Conclusion
I would keep the verdi presto
API as clean as possible, but:
- Improve the interactive experience when the defaults fail to automatically create the database.
- Store the defaults as provided by the users, so future
verdi presto
commands work smoothly. - Allow the user to configure the defaults with
verdi config
(I'm not sure where these are stored? I don't see them inverdi config list
).
However, I'm still not 100% convinced of this approach either. My main concern with keeping the options is that once we release, we can't remove them anymore. And I think we all agree we want to keep verdi presto
as simple as possible.
I think we may need a meeting to discuss in person, because going back-and-forth on github doesn't seem very efficient.
I fully agree. And the only options that I added are those that are necessary.
We shouldn't. In my opinion, we are adding this functionality just for the
This seems to be contradictory. On the one hand you want to keep
Just run
If the database already exists, why use
This should be easy to fix.
The
I don't think relying on prompts is good design. At least it should not be required. There should always be a non-interactive option. I don't see the problem though. The options are exactly that, optional. They only show up if the user looks at the help text. How is that "confusing"? I could see the point with the old setup commands where all options would always be prompted for, even if users often would just want to use the results. But here, they are not even prompted for it. They just show up in the help text. I don't see a problem with that if it means that it provides all necessary options for all potential variants of PSQL servers. |
Makes sense, I agree. 👍
Haha, completely fair. It's also why I argued that in the end making the usage of
Perhaps, we are probably a bit too hung up on these extra options in our zeal of keeping |
f262916
to
dc2f84f
Compare
The `verdi setup` and `verdi quicksetup` commands have been deprecated and replaced by `verdi profile setup` and `verdi presto`. The installation docs were heavily outdated and the flow was scattered. The biggest change is that there now is a "quick install guide" that relies on `verdi presto` to provide an install route that is fool proof and will work on almost any system in a minimal amount of commands. Then there is the "complete installation guide" that provides all the details necessary to fully customize an installation.
dc2f84f
to
cdf3f18
Compare
The `verdi setup` and `verdi quicksetup` commands have been deprecated and replaced by `verdi profile setup` and `verdi presto`. The installation docs were heavily outdated and the flow was scattered. The biggest change is that there now is a "quick install guide" that relies on `verdi presto` to provide an install route that is fool proof and will work on almost any system in a minimal amount of commands. Then there is the "complete installation guide" that provides all the details necessary to fully customize an installation.
The
verdi setup
andverdi quicksetup
commands have been deprecated and replaced byverdi profile setup
andverdi presto
. The installation docs were heavily outdated and the flow was scattered.The biggest change is that there now is a "quick install guide" that relies on
verdi presto
to provide an install route that is fool proof and will work on almost any system in a minimal amount of commands.Then there is the "complete installation guide" that provides all the details necessary to fully customize an installation.