Cache is_satisfied_by #12453

sbidoul · 2023-12-28T15:46:12Z

This is a performance improvement for a presumably common use case, namely the installation of a project with locked dependencies using pip install -c requirements.txt -e ., in an environment where most dependencies are already installed.

More precisely the case I'm testing is a ~500 lines requirements.txt, and running the above command in an environment where all dependencies are already installed. This is typically done by developers when pulling a new version of the project, to ensure their environment is up-to-date.

When running this under py-spy, it appears that PipProvider.is_satisfied_by largely dominates the 50 seconds running resolve() (out of a total 55 sec execution time).

Some tracing showed that is_satisfied_by is repeatedly called for the same requirement and candidate.

So I experimented with some caching.

The result of this PR is a 40 seconds saving on the 50 sec resolve(). The total execution time went down to 15 sec from 55.

I'm opening as draft as there are some possible improvements, but it's ready for comments on the general approach already.

sbidoul · 2023-12-28T15:49:02Z

src/pip/_internal/resolution/resolvelib/requirements.py

    def __init__(self, ireq: InstallRequirement) -> None:
        assert ireq.link is None, "This is a link, not a specifier"
        self._ireq = ireq
        self._extras = frozenset(canonicalize_name(e) for e in self._ireq.extras)
+        KeyBasedCompareMixin.__init__(
+            self, key=str(ireq), defining_class=SpecifierRequirement
+        )


Instead of using str here we may want to make InstallRequirement hashable and comparable.

sbidoul · 2023-12-28T15:57:21Z

The root cause of the perf issue is likely related to sarugaku/resolvelib#147

pfmoore · 2023-12-28T16:07:06Z

+1 on the general approach. Making InstallRequirement hashable and comparable in its own right can be a follow-up PR if it's complex (and it looks like it could be complex).

There's a risk of incorrect results caused by two requirements which compare as equal but actually aren't (for example two URL requirements with the same target but different auth data) but I'm assuming that risk is small enough to be acceptable.

sbidoul · 2023-12-28T16:26:08Z

src/pip/_internal/resolution/resolvelib/candidates.py

+        KeyBasedCompareMixin.__init__(
+            self, key=(self.name, self.version), defining_class=self.__class__
+        )
+


Here it might be safer to hash and compare on self.dist.

Although this may change the logic?

pradyunsg · 2023-12-28T18:01:23Z

What's the contains that dominates is_satisfied_by?

sbidoul · 2023-12-28T18:05:36Z

What's the contains that dominates is_satisfied_by?

It's from packaging.specifiers.

notatallshaw · 2023-12-28T19:25:43Z

The root cause of the perf issue is likely related to sarugaku/resolvelib#147

Yes, resolvelibs algorithm checks each step if each requirement has been satisfied: https://github.com/sarugaku/resolvelib/blob/1.0.1/src/resolvelib/resolvers.py#L217, called here https://github.com/sarugaku/resolvelib/blob/1.0.1/src/resolvelib/resolvers.py#L409 and here https://github.com/sarugaku/resolvelib/blob/1.0.1/src/resolvelib/resolvers.py#L443.

So for 500 requirements, that are not top level, there will be at least 500 steps, where each step will check if each of the 500 requirements have already been resolved, which for yet unpinned requirements involves checking each requirement's requirements leading to at least O(n^2) behavior. Ideally there would be an algorthmic fix for this, but I've not yet found one.

Also I beleive this PR potentially fixes this users issue #12314 (though reproducing the users results has been challenging).

notatallshaw · 2023-12-28T19:40:37Z

By the way for this specific use case:

common use case, namely the installation of a project with locked dependencies using pip install -c requirements.txt -e .

I do have a seperate idea that could fix it algorithmically: sarugaku/resolvelib#146. But I've not done any work towards a PR yet, so I can't compare.

notatallshaw · 2023-12-28T19:43:45Z

src/pip/_internal/resolution/resolvelib/provider.py

        )

+    @lru_cache(maxsize=None)


It would be good to check how much this increases memory consumption for a worst case scenario, i.e. when ResolutionTooDeep is reached.

This set of requirements can reach that sphinx sphinx-rtd-theme sphinx-toolbox myst-parser sphinxcontrib-bibtex nbsphinx. I can test this in the new year if you'd like.

I experimented with a few different requirements dry install, looking at apache-airflow[all] I saw an increase of peak memory usage from 298 MBs to 306 MBs, I also saw the time to completion drop from ~2m 41s to ~2m 12s.

Personally this seems significantly worth the memory/time trade off to me (and looking at the memray flamegraph of these installs there are potentially big areas for improvement in other parts of Pip's codebase to use less memory)

notatallshaw · 2024-02-19T14:01:28Z

Is there a reason this is still in draft mode? It would be great to see this land this year.

sbidoul · 2024-02-19T14:11:05Z

@notatallshaw thanks for the ping. I think this was ready, but it needs double checking for correctness. So I marked it ready for review. I'll add the news entry soon.

notatallshaw · 2024-04-09T20:03:38Z

Testing performance today on Python 3.12 with all packages cached (ran the command twice to populate cache) pip install --dry-run apache-airflow[all] took ~5 mins 50 seconds on main and ~3 mins 10 seconds on this branch.

This is a huge performance improvement for real world large dependency trees.

sbidoul · 2024-04-21T11:38:34Z

@pradyunsg I tentatively added this to 24.1. Feel free to postpone if you are not comfortable with this.

There will be minor conflict with #12300, which I'll be happy to resolve, whichever of the two is merged first.

sbidoul · 2024-04-21T11:44:08Z

Oh, wait KeyBasedCompareMixin I was relying on here has been removed in #12571. I'm resetting to draft.

ichard26 · 2024-04-21T14:30:51Z

Sorry about that @sbidoul! I suppose in hindsight it would've been better to not remove that mixin.

sbidoul · 2024-04-21T14:32:10Z

No worries. I already pushed the update.

sbidoul · 2024-04-23T07:13:47Z

Thanks for the review and merge @uranusjr.

I suppose in hindsight it would've been better to not remove that mixin.

@ichard26 no issue at all. Dead code must be removed without mercy ;)

sbidoul commented Dec 28, 2023

View reviewed changes

sbidoul mentioned this pull request Dec 28, 2023

path_to_url called millions of times for ~1000 offline wheel installs #12320

Closed

1 task

notatallshaw reviewed Dec 28, 2023

View reviewed changes

notatallshaw mentioned this pull request Jan 12, 2024

For n packages there are O(n^2) calls to _is_current_pin_satisfying sarugaku/resolvelib#147

Open

sbidoul force-pushed the cache-is_satified_by-sbi branch from 4f6cc32 to 98c5cae Compare January 15, 2024 12:15

notatallshaw mentioned this pull request Jan 23, 2024

Max Backtracking Option and print out current failure casues #10417

Open

sbidoul marked this pull request as ready for review February 19, 2024 14:07

notatallshaw mentioned this pull request Apr 9, 2024

Release 24.1 #12613

Closed

sbidoul added 3 commits April 21, 2024 13:29

Cache is_satisfied_by

f5480f1

Use the caching variant of is_satified_by in Factory.find_candidates

e16867e

Optimize hashing and comparison of AlreadyInstalledCandidate

705ce7d

sbidoul force-pushed the cache-is_satified_by-sbi branch from 98c5cae to 705ce7d Compare April 21, 2024 11:33

Add news

efab550

psf-chronographer bot added the bot:chronographer:provided label Apr 21, 2024

sbidoul added this to the 24.1 milestone Apr 21, 2024

sbidoul removed this from the 24.1 milestone Apr 21, 2024

sbidoul marked this pull request as draft April 21, 2024 11:44

Don't use KeyBasedCompareMixin

38b5645

sbidoul marked this pull request as ready for review April 21, 2024 14:31

sbidoul added this to the 24.1 milestone Apr 21, 2024

uranusjr approved these changes Apr 23, 2024

View reviewed changes

uranusjr merged commit 6522547 into pypa:main Apr 23, 2024
24 checks passed

sbidoul deleted the cache-is_satified_by-sbi branch April 23, 2024 07:12

This was referenced Apr 27, 2024

New resolver takes 1-2 hours to install a large requirements file #12314

Closed

Cache calculated hashes for requirements and _InstallRequirementBackedCandidate #12657

Merged

github-actions bot locked as resolved and limited conversation to collaborators May 9, 2024

ichard26 added the type: performance Commands take too long to run label Dec 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache is_satisfied_by #12453

Cache is_satisfied_by #12453

sbidoul commented Dec 28, 2023

sbidoul Dec 28, 2023

sbidoul commented Dec 28, 2023

pfmoore commented Dec 28, 2023

sbidoul Dec 28, 2023

sbidoul Dec 28, 2023

pradyunsg commented Dec 28, 2023

sbidoul commented Dec 28, 2023

notatallshaw commented Dec 28, 2023 •

edited

Loading

notatallshaw commented Dec 28, 2023 •

edited

Loading

notatallshaw Dec 28, 2023

notatallshaw Jan 2, 2024 •

edited

Loading

notatallshaw commented Feb 19, 2024

sbidoul commented Feb 19, 2024

notatallshaw commented Apr 9, 2024

sbidoul commented Apr 21, 2024

sbidoul commented Apr 21, 2024

ichard26 commented Apr 21, 2024

sbidoul commented Apr 21, 2024

sbidoul commented Apr 23, 2024

Cache is_satisfied_by #12453

Cache is_satisfied_by #12453

Conversation

sbidoul commented Dec 28, 2023

sbidoul Dec 28, 2023

Choose a reason for hiding this comment

sbidoul commented Dec 28, 2023

pfmoore commented Dec 28, 2023

sbidoul Dec 28, 2023

Choose a reason for hiding this comment

sbidoul Dec 28, 2023

Choose a reason for hiding this comment

pradyunsg commented Dec 28, 2023

sbidoul commented Dec 28, 2023

notatallshaw commented Dec 28, 2023 • edited Loading

notatallshaw commented Dec 28, 2023 • edited Loading

notatallshaw Dec 28, 2023

Choose a reason for hiding this comment

notatallshaw Jan 2, 2024 • edited Loading

Choose a reason for hiding this comment

notatallshaw commented Feb 19, 2024

sbidoul commented Feb 19, 2024

notatallshaw commented Apr 9, 2024

sbidoul commented Apr 21, 2024

sbidoul commented Apr 21, 2024

ichard26 commented Apr 21, 2024

sbidoul commented Apr 21, 2024

sbidoul commented Apr 23, 2024

notatallshaw commented Dec 28, 2023 •

edited

Loading

notatallshaw commented Dec 28, 2023 •

edited

Loading

notatallshaw Jan 2, 2024 •

edited

Loading