Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy pydantic import #6275

Merged
merged 10 commits into from
Feb 8, 2024
Merged

Lazy pydantic import #6275

merged 10 commits into from
Feb 8, 2024

Conversation

danielhollas
Copy link
Collaborator

@danielhollas danielhollas commented Feb 4, 2024

pydantic adds a lot of time to aiida startup, which is especially detrimental to tab completion.
Here I test the approach of deferring import of aiida.manage.configuration.config until really needed, which in turn defers pydantic.

Upcoming usage of pydantic from #6255 seems to be centered in aiida.orm which is not imported by default in verdi so this PR still seems worth it.

This is in principle a breaking change since we no longer export Config in aiida.manage.configuration module. But I guess it is not meant to be manipulated directly by users anyway.

@danielhollas
Copy link
Collaborator Author

Some benchmarks using verdi --version as a proxy for tab-completion, with pydantic v2.6.0.
I am seeing ~35ms speedup on my new fancy NVMe drive. On my old laptop the difference would likely be much bigger.

main

$ hyperfine -w 5 'verdi --version' 
Benchmark 1: verdi --version
  Time (mean ± σ):     101.4 ms ±   2.3 ms    [User: 83.6 ms, System: 16.5 ms]
  Range (minmax):    97.6 ms105.6 ms    30 runs

this PR

Benchmark 1: verdi --version
  Time (mean ± σ):      65.2 ms ±   2.1 ms    [User: 50.9 ms, System: 13.2 ms]
  Range (minmax):    62.2 ms69.8 ms    45 runs

@danielhollas danielhollas marked this pull request as ready for review February 4, 2024 03:03
@edan-bainglass
Copy link
Member

Thank you @danielhollas for addressing this. I have a question - how much does pydantic load-time affect AiiDAlab? Is this one of the sources of lag in app load time?

@danielhollas
Copy link
Collaborator Author

@edan-bainglass it is not, since pydantic was used by aiida-core only since version 2.5, and we haven't even released an aiidalab image with that version.

That being said, version 2.5 contains a lot of work that improved import time so it should result in noticeable improvement. I'll let you know once we release the image, but it might take some time since we need to resolve some package version incompatibilities.

@danielhollas
Copy link
Collaborator Author

I'll also note that this particular PR will not help with most of aiida operations (or AiiDAlab load times) since we will need to load pydantic as soon as we load the AiiDA profile.

CC @sphuber this is ready for review. Happy to hear your thoughts. This PR kind of assumes that the use of pydantic will stay somewhat localized to aiida.orm module. If you have other plans than perhaps this is not worth it.

@sphuber
Copy link
Contributor

sphuber commented Feb 7, 2024

CC @sphuber this is ready for review. Happy to hear your thoughts. This PR kind of assumes that the use of pydantic will stay somewhat localized to aiida.orm module. If you have other plans than perhaps this is not worth it.

Well, it already is used outside of aiida.orm, e.g.:

from pydantic import BaseModel, Field

And it is in aiida.cmdline

from pydantic_core import PydanticUndefined

but I made sure to keep imports inside methods whenever possible. But I am not a 100% sure if this is not still evaluated during tab-completion, because one of the main motivations for adopting pydantic is to have verdi add subcommands dynamically based on which plugins are installed. See #6190

In #6255 we go even further and essentially use pydantic in almost each aiida.orm submodule. Would those automatically be handled by the changes in this PR? Or would special care still have to be taken? Is there anyway we can test this for regression reliably. There currently is a very basic test in the CI that we added years back, which simply measures the run time of verdi in a loop. But it is quite fragile if we set the limit too close to the ideal time (which is also strongly system specific) and so we risk false positives.

This is in principle a breaking change since we no longer export Config in aiida.manage.configuration module. But I guess it is not meant to be manipulated directly by users anyway.

Maybe not users, but third-party applications may very well be using it directly. For example, would AiiDAlab maybe not be affected? Surely they will operate on the Config, for example to set options etc and get profile information. Is this the only way of deferring the import of pydantic?

@danielhollas
Copy link
Collaborator Author

Thanks for taking a look @sphuber!

Well, it already is used outside of aiida.orm, e.g.:

Right, but I indeed double checked that in those cases the imports are inlined so I assumed it will be fine.
I have now verified this assumption manually, that pydantic is indeed not being imported during tab-completion (by adding raise ValueError in pydantic/__init__.py.

Would those automatically be handled by the changes in this PR?

We already verify that aiida.orm is not being imported in verdi startup, by running verdi devel check-load-time so provided that you didn't add pydantic import outside of aiida.orm that PR should be fine.

Is there anyway we can test this for regression reliably.

Indeed we can, that's what verdi devel check-undesired-imports is for! Unfortunately, it does not work in this case, because actually running verdi devel loads the configuration. In the tab-completion case, we have a special monkey-patch that prevents evaluation of dynamic default values that I added in #6144.

So instead I've added a regression test that calls tab-completion internally.

Maybe not users, but third-party applications may very well be using it directly. For example, would AiiDAlab maybe not be affected?

I don't think so? In AiiDAlab we mostly only use load_profile. I would assume third-party code would similarly only use the helper functions defined in aiida/manage/configuration/__init__.py such as get_config() / load_config().

Is this the only way of deferring the import of pydantic?

Not sure how to answer this. 😅 The Config class is derived from the pydantic BaseClass, and so pydantic needs to be imported when the Config class itself (not its instances) is built. I don't see a way how to make the class Config available in aiida.manage while also not importing pydantic.

Quoting from https://docs.python.org/3/library/sys.html

[sys.modules] is a dictionary that maps module names to modules
which have already been loaded. This can be manipulated
to force reloading of modules and other tricks.
However, replacing the dictionary will not necessarily work,
as expected and deleting essential items from the dictionary may cause Python to fail...

Let's just ignore the last sentence :-)
@danielhollas
Copy link
Collaborator Author

By the way, I've just found out about verdi devel play, what a beautiful easter egg. 😂 🎶

@sphuber
Copy link
Contributor

sphuber commented Feb 8, 2024

By the way, I've just found out about verdi devel play, what a beautiful easter egg. 😂 🎶

It's not the only easter egg in verdi (hint, hint) ;)

Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @danielhollas

Not sure how to answer this. 😅 The Config class is derived from the pydantic BaseClass, and so pydantic needs to be imported when the Config class itself (not its instances) is built. I don't see a way how to make the class Config available in aiida.manage while also not importing pydantic.

I now remember looking into this and trying but coming up short. I think indeed that it is not possible. So I guess your solution is the best we can do for now.

Also think the breaking of the import is acceptable, so let's continue with these changes.

Indeed we can, that's what verdi devel check-undesired-imports is for! Unfortunately, it does not work in this case, because actually running verdi devel loads the configuration. In the tab-completion case, we have a special monkey-patch that prevents evaluation of dynamic default values that I added in #6144. So instead I've added a regression test that calls tab-completion internally.

Very nice. I am wondering if the verdi devel check-undesired-imports is now superfluous as it is a subset of the unit test you added? I think technically there is still a difference in code path between merely tab-completing, and actually invoking a command. As you say, some of the command parameters were changed to lazily evaluate their default value, such that they are only executed when the command is actually called. What we are really interested in is that the tab-complete is responsive. So I think we can just add the modules from verdi devel check-undesired-imports to your unit test and get rid of that command (and invocation in the CI scripts). Right?

tests/conftest.py Outdated Show resolved Hide resolved
tests/conftest.py Outdated Show resolved Hide resolved
tests/cmdline/params/options/test_callable.py Outdated Show resolved Hide resolved
tests/cmdline/params/options/test_callable.py Outdated Show resolved Hide resolved
tests/cmdline/params/options/test_callable.py Outdated Show resolved Hide resolved
@danielhollas
Copy link
Collaborator Author

Very nice. I am wondering if the verdi devel check-undesired-imports is now superfluous as it is a subset of the unit test you added? I think technically there is still a difference in code path between merely tab-completing, and actually invoking a command. As you say, some of the command parameters were changed to lazily evaluate their default value, such that they are only executed when the command is actually called. What we are really interested in is that the tab-complete is responsive. So I think we can just add the modules from verdi devel check-undesired-imports to your unit test and get rid of that command (and invocation in the CI scripts). Right?

I've been thinking about this as well. As you mention, verdi devel check-undesired-imports provide stronger guarantees. I've looked at the list of the blacklisted modules and for most of them I think this stronger guarantee is warranted, since many of them incur significant import cost. In other words, I think it's reasonable to expect that verdi commands that can be fast are actually fast (e.g. asyncio should be loaded only when really needed).

@danielhollas danielhollas requested a review from sphuber February 8, 2024 17:15
@sphuber sphuber merged commit 9524cda into aiidateam:main Feb 8, 2024
19 checks passed
@danielhollas danielhollas deleted the lazy-pydantic branch February 8, 2024 23:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants