Back jump doesn't do well in some cases #180

frostming · 2024-12-02T02:30:07Z

Here are the package set to be resolved

1. A==1
2. B  # B@2 depends on D@2 and B@1 depends on D@1
3. C
4. D  # D@2 depends on A@2 and D@1 depends on A@1

Assume the resolver resolves packages from top to down, when resolving D, the current pins are:

A@1
B@2
C@x

Because B@2 is pinned which constrains D==2, we can't find a compatible version of D that is also happy with A@1, so conflicts happen and we back jump and pop the last pinned package C, which is totally unrelated to the incompatibilities.

However, due to the line in backjump implementation:

resolvelib/src/resolvelib/resolvers/resolution.py

Lines 327 to 330 in 51dbc92

    
           # Only backjump if the current broken state is 
        
           # an incompatible dependency 
        
           if name not in incompatible_deps: 
        
               break

The backjump quits very early and fallbacks to backtrack. It would be extremely slow if traversing C is super expensive. But it's obvious we should jump over to the state of B and try other versions. My question is, does the quit condition make sense in any case I am not aware of or should we fix it?

@notatallshaw

To be concrete, A: h11, B: httpcore, C: Pillow, D: httpx

The text was updated successfully, but these errors were encountered:

notatallshaw · 2024-12-02T02:36:38Z

Yeah, the exit is probably too aggressive, but there are definitely cases that incorrectly reach resolution impossible without it.

I have at least a few real world examples to try against if you have an idea how to loosen this without causing incorrect resolutions.

frostming · 2024-12-02T02:45:47Z

I have at least a few real world examples to try against if you have an idea how to loosen this without causing incorrect resolutions.

I think we previously proposed to save the states first, and if no state that can jump is found, restore the states and fallback to backtrack.

notatallshaw · 2024-12-02T02:47:04Z

Btw, I have an intuition that if the backtrack causes are disjoint, i.e every cause is incompatible with some other cause (or set of causes), resolvelib could aggressively backjump. But currently the provider has no way to indicate to the resolver disjointness.

I will experiment this week with disjointness and aggressively backjumping, to see if the intuition is correct.

notatallshaw · 2024-12-02T02:49:39Z

I think we previously proposed to save the states first, and if no state that can jump is found, restore the states and fallback to backtrack.

Yeah, it wasn't very elegant though, for example it made complicated impossible resolutions much slower.

notatallshaw · 2024-12-10T02:14:24Z

My understanding is the way backjumping was originally implemented in resolvelib did not actually meet the prerequisites for backjumping to be logically sound. In particular a resolvelib "cause" is distinct from what a "cause" might be in satisfiability theory, for example you might end up with "causes" in resolvelib that have the constraints numpy>1.3, numpy>2, numpy<2, but I think the logical causes should be reduced to the constraints numpy>2, numpy<2.

I was hoping to introduce a new API that allowed the provider to filter these causes to only the logical ones, but I realized there's another case that I wasn't considering (#171 (comment)) and further I'm also concerned that the "mathematics" of pre-releases will have an unknown impact on trying to implement classical satisfiability theory (e.g. https://discuss.python.org/t/proposal-intersect-and-disjoint-operations-for-python-version-specifiers/71888/1).

All this is a long winded way to say I'm moving towards the opinion of introducing a save state. My thoughts are:

Only create 1 save state
Create that save state as late as possible
Only revert to the save state if resolution impossible
Give the provider no options(?), to simplify the logic

I think we would call this "optimistic backjumping", we backjump on the assumption that resolvelib causes are close enough to logical satisfiability causes, and we revert in the face of resolution impossible.

I am going to work on making a PR and running it against scenarios in https://github.com/notatallshaw/Pip-Resolution-Scenarios-and-Benchmarks to see what the performance wins are.

notatallshaw · 2024-12-14T20:52:19Z

I made a branch for pip that adds a commit for optimistic backjumping on top of resolvelib 1.1 (pypa/pip#13001) and then ran scenarios to compare them.

It comes with big wins and a big loss, here are the statistics comparing the resolvelib 1.1 branch to the optimistic backjumping branch:

Output from comparison using tools from Pip-Resolution-Scenarios-and-Benchmarks:

Difference for scenario scenarios/problematic.toml - autogluon:
        Both failed: Build Failure
        Distinct Wheels visisted: 162 -> 163 (100.62%)
        Total visisted packages: 2398 -> 2184 (91.08%)
        Total visisted requirements: 22286 -> 19708 (88.43%)
        Total rejected requirements: 12791 -> 11089 (86.69%)
        Total pinned packages: 729 -> 740 (101.51%)
        Total rounds: 1283 -> 1220 (95.09%)

Difference for scenario scenarios/problematic.toml - pip-issue-12754:
        Both failed: Resolution Too Deep
        Distinct Sdists visisted: 40 -> 81 (202.50%)
        Distinct Wheels visisted: 82 -> 23 (28.05%)
        Total visisted packages: 15004 -> 19031 (126.84%)
        Total visisted requirements: 57744 -> 60022 (103.94%)
        Total rejected requirements: 55920 -> 52594 (94.05%)
        Total pinned packages: 8029 -> 8551 (106.50%)
        Total rounds: 15019 -> 15012 (99.95%)

Difference for scenario scenarios/problematic.toml - TTS:
        Total visisted packages: 398 -> 383 (96.23%)
        Total visisted requirements: 1099 -> 1084 (98.64%)
        Total rejected requirements: 581 -> 533 (91.74%)
        Total rounds: 300 -> 295 (98.33%)

Difference for scenario scenarios/problematic.toml - boto3-urllib3-transient:
        Success: False -> True
        Failure Reason: Resolution Too Deep -> None
        Distinct Sdists visisted: 28 -> 0 (0.00%)
        Distinct Wheels visisted: 2604 -> 141 (5.41%)
        Total visisted packages: 419687 -> 297 (0.07%)
        Total visisted requirements: 1436990 -> 665 (0.05%)
        Total rejected requirements: 32320 -> 57 (0.18%)
        Total pinned packages: 7584 -> 277 (3.65%)
        Total rounds: 15001 -> 281 (1.87%)

Difference for scenario scenarios/problematic.toml - kedro-test:
        Total visisted packages: 3797 -> 3843 (101.21%)
        Total visisted requirements: 12093 -> 12395 (102.50%)
        Total rejected requirements: 3006 -> 3048 (101.40%)
        Total pinned packages: 2051 -> 2084 (101.61%)
        Total rounds: 2774 -> 2816 (101.51%)

Difference for scenario scenarios/problematic.toml - sphinx:
        Failure Reason: Resolution Too Deep -> Build Failure
        Distinct Sdists visisted: 1 -> 4 (400.00%)
        Distinct Wheels visisted: 124 -> 135 (108.87%)
        Total visisted packages: 97751 -> 46035 (47.09%)
        Total visisted requirements: 135827 -> 64137 (47.22%)
        Total rejected requirements: 137900 -> 64823 (47.01%)
        Total pinned packages: 8118 -> 3904 (48.09%)
        Total rounds: 15013 -> 7145 (47.59%)

Difference for scenario scenarios/problematic.toml - backtracks-to-old-scipy:
        Both failed: Build Failure
        Total visisted packages: 235 -> 211 (89.79%)
        Total visisted requirements: 967 -> 679 (70.22%)
        Total rejected requirements: 316 -> 236 (74.68%)
        Total rounds: 99 -> 91 (91.92%)

Difference for scenario scenarios/problematic.toml - apache-airflow-google-cloud:
        Success: True -> False
        Failure Reason: None -> Resolution Too Deep
        Distinct Wheels visisted: 179 -> 195 (108.94%)
        Total visisted packages: 184 -> 16791 (9125.54%)
        Total visisted requirements: 520 -> 31559 (6069.04%)
        Total rejected requirements: 36 -> 3606 (10016.67%)
        Total pinned packages: 163 -> 14853 (9112.27%)
        Total rounds: 165 -> 15007 (9095.15%)

Difference for scenario scenarios/problematic.toml - django-stubs:
        Total visisted packages: 38 -> 30 (78.95%)
        Total visisted requirements: 190 -> 135 (71.05%)
        Total rejected requirements: 21 -> 10 (47.62%)
        Total rounds: 26 -> 25 (96.15%)

Difference for scenario scenarios/problematic.toml - kauldron-docs:
        Success: False -> True
        Failure Reason: Resolution Too Deep -> None
        Distinct Sdists visisted: 17 -> 6 (35.29%)
        Distinct Wheels visisted: 323 -> 303 (93.81%)
        Total visisted packages: 119167 -> 4565 (3.83%)
        Total visisted requirements: 372168 -> 27670 (7.43%)
        Total rejected requirements: 48082 -> 1812 (3.77%)
        Total pinned packages: 8605 -> 2362 (27.45%)
        Total rounds: 15005 -> 2830 (18.86%)

Some details to note, my tooling stops resolution at ~15k rounds as so far I haven't found a resolution that completes between 15k and 200k (pip's max default) rounds.

There were no resolution differences for either popular packages or big packages, all the resolution differences were related to problematic resolutions.

Generally performance was much better, in-particular boto3-urllib3-transient and kauldron-docs went from resolution too deep to quickly solvable which is the best win. And sphinx went from resolution too deep to build failure, which I consider a win because it gives the user information to apply lower bounds.

However, apache-airflow-google-cloud went from quickly completing to resolution too deep, which I consider to be the worst failure, this is the same error that the current pip 24.3 produces, and resolvelib 1.1 was going to solve this. I suspect that optimistic backjumping has skipped over an important branch in the resolution graph and even with optimistic backjumping it can not reach resolution impossible before the 15k rounds in my tooling (nor even the 200k rounds in pip 24.3).

I have some bug fixes and further optimizations to pip I want to test against to see if this improves the "apache-airflow-google-cloud" situation at all, but due to obligations I don't think I will have anymore time to dedicate to this in 2024. Currently this leaves me on the fence about optimistic backjumping.

notatallshaw mentioned this issue Dec 14, 2024

Vendor resolvelib 1.1.0 pypa/pip#13001

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Back jump doesn't do well in some cases #180

Back jump doesn't do well in some cases #180

frostming commented Dec 2, 2024 •

edited

Loading

notatallshaw commented Dec 2, 2024

frostming commented Dec 2, 2024 •

edited

Loading

notatallshaw commented Dec 2, 2024 •

edited

Loading

notatallshaw commented Dec 2, 2024

notatallshaw commented Dec 10, 2024 •

edited

Loading

notatallshaw commented Dec 14, 2024 •

edited

Loading

Back jump doesn't do well in some cases #180

Back jump doesn't do well in some cases #180

Comments

frostming commented Dec 2, 2024 • edited Loading

notatallshaw commented Dec 2, 2024

frostming commented Dec 2, 2024 • edited Loading

notatallshaw commented Dec 2, 2024 • edited Loading

notatallshaw commented Dec 2, 2024

notatallshaw commented Dec 10, 2024 • edited Loading

notatallshaw commented Dec 14, 2024 • edited Loading

frostming commented Dec 2, 2024 •

edited

Loading

frostming commented Dec 2, 2024 •

edited

Loading

notatallshaw commented Dec 2, 2024 •

edited

Loading

notatallshaw commented Dec 10, 2024 •

edited

Loading

notatallshaw commented Dec 14, 2024 •

edited

Loading