-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache is_satisfied_by #12453
Cache is_satisfied_by #12453
Conversation
def __init__(self, ireq: InstallRequirement) -> None: | ||
assert ireq.link is None, "This is a link, not a specifier" | ||
self._ireq = ireq | ||
self._extras = frozenset(canonicalize_name(e) for e in self._ireq.extras) | ||
KeyBasedCompareMixin.__init__( | ||
self, key=str(ireq), defining_class=SpecifierRequirement | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using str
here we may want to make InstallRequirement
hashable and comparable.
The root cause of the perf issue is likely related to sarugaku/resolvelib#147 |
+1 on the general approach. Making There's a risk of incorrect results caused by two requirements which compare as equal but actually aren't (for example two URL requirements with the same target but different auth data) but I'm assuming that risk is small enough to be acceptable. |
KeyBasedCompareMixin.__init__( | ||
self, key=(self.name, self.version), defining_class=self.__class__ | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it might be safer to hash and compare on self.dist
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this may change the logic?
What's the |
It's from |
Yes, resolvelibs algorithm checks each step if each requirement has been satisfied: https://github.com/sarugaku/resolvelib/blob/1.0.1/src/resolvelib/resolvers.py#L217, called here https://github.com/sarugaku/resolvelib/blob/1.0.1/src/resolvelib/resolvers.py#L409 and here https://github.com/sarugaku/resolvelib/blob/1.0.1/src/resolvelib/resolvers.py#L443. So for 500 requirements, that are not top level, there will be at least 500 steps, where each step will check if each of the 500 requirements have already been resolved, which for yet unpinned requirements involves checking each requirement's requirements leading to at least Also I beleive this PR potentially fixes this users issue #12314 (though reproducing the users results has been challenging). |
By the way for this specific use case:
I do have a seperate idea that could fix it algorithmically: sarugaku/resolvelib#146. But I've not done any work towards a PR yet, so I can't compare. |
) | ||
|
||
@lru_cache(maxsize=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to check how much this increases memory consumption for a worst case scenario, i.e. when ResolutionTooDeep
is reached.
This set of requirements can reach that sphinx sphinx-rtd-theme sphinx-toolbox myst-parser sphinxcontrib-bibtex nbsphinx
. I can test this in the new year if you'd like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I experimented with a few different requirements dry install, looking at apache-airflow[all]
I saw an increase of peak memory usage from 298 MBs to 306 MBs, I also saw the time to completion drop from ~2m 41s to ~2m 12s.
Personally this seems significantly worth the memory/time trade off to me (and looking at the memray flamegraph of these installs there are potentially big areas for improvement in other parts of Pip's codebase to use less memory)
4f6cc32
to
98c5cae
Compare
Is there a reason this is still in draft mode? It would be great to see this land this year. |
@notatallshaw thanks for the ping. I think this was ready, but it needs double checking for correctness. So I marked it ready for review. I'll add the news entry soon. |
Testing performance today on Python 3.12 with all packages cached (ran the command twice to populate cache) This is a huge performance improvement for real world large dependency trees. |
98c5cae
to
705ce7d
Compare
@pradyunsg I tentatively added this to 24.1. Feel free to postpone if you are not comfortable with this. There will be minor conflict with #12300, which I'll be happy to resolve, whichever of the two is merged first. |
Oh, wait |
Sorry about that @sbidoul! I suppose in hindsight it would've been better to not remove that mixin. |
No worries. I already pushed the update. |
This is a performance improvement for a presumably common use case, namely the installation of a project with locked dependencies using
pip install -c requirements.txt -e .
, in an environment where most dependencies are already installed.More precisely the case I'm testing is a ~500 lines
requirements.txt
, and running the above command in an environment where all dependencies are already installed. This is typically done by developers when pulling a new version of the project, to ensure their environment is up-to-date.When running this under py-spy, it appears that
PipProvider.is_satisfied_by
largely dominates the 50 seconds runningresolve()
(out of a total 55 sec execution time).Some tracing showed that
is_satisfied_by
is repeatedly called for the same requirement and candidate.So I experimented with some caching.
The result of this PR is a 40 seconds saving on the 50 sec
resolve()
. The total execution time went down to 15 sec from 55.I'm opening as draft as there are some possible improvements, but it's ready for comments on the general approach already.