Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sys.path for package discovery #9849

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

pradyunsg
Copy link

Resolves #2500
Intend to resolve #4466 before coming out of draft mode here.

x-ref #2413, since this PR removes an assumption introduced in that PR that a missing site-packages directory means that there are no packages to be found.

Summary

  • Change installed package discovery logic to use sys.path rather than inferred paths based on sysconfig's purelib and platlib directories.
  • [TODO] Change uninstallation logic to skip over packages that are installed outside the working environment.
  • [TODO] Remove any dead code, if the site_packages we were computing for this is not used anywhere else.

Test Plan

Only manually tested as of now.

The plan for automated tests is: Use a .pth file as an extra path to be injected to the virtual environment being modified, with the extra path pointing to site-packages from another virtual environment.

This should be less complicated than the proposed approaches of using sitecustomize.py as well as relying on system-site-packages with a mock system interpreter -- all these approaches modify the sys.path in a manner visible to the interpreter introspection script and using a .pth file is likely to be the easiest to reason about.

This better reflects the intended behavior of this oject, which is to
serve as a database of InstalledPackages rather than a representation of
the relevant environment/interpreter path.
@charliermarsh
Copy link
Member

Wow, new contributor just dropped!

Screenshot 2024-12-12 at 3 15 03 PM

@charliermarsh charliermarsh self-assigned this Dec 12, 2024
@pradyunsg
Copy link
Author

sigh

Locally, I'm hitting rust-lang/rust-analyzer#16959 trying to diagnose what's happening in the two spots that the most recent commit has added more granular bail! logs to, and looks like I can't step through with a debugger on this without resorting to hackery around CARGO_MANIFEST_DIR. 😞

All the tests run via the VS Code UI for debug end up panicking before running with:

---- tool_upgrade::tool_upgrade_multiple_names stdout ----
thread 'tool_upgrade::tool_upgrade_multiple_names' panicked at crates/uv/tests/it/common/mod.rs:292:84:
called `Result::unwrap()` on an `Err` value: NotPresent

Looks like I'll be falling back to using regular println! to diagnose things. 🌈

@pradyunsg
Copy link
Author

pradyunsg commented Dec 13, 2024

Doing a brain dump since I need a break now -- it took me a bit of time to separate a bunch of tooling failures/noise, and actually understand what's broken here...

let interpreter = interpreter.with_virtualenv(virtualenv);

The virtual environment's Interpreter object is created based on the base environments' Interpreter object. This step, notably, does not update the information about sys.path, which keeps pointing to paths from the base environment. A bunch of the failures in tests are happening since the logic ends up using a "stale" sys.path information from the base environment when working with the virtual environment's interpreter.

An easy reproducer for this, with this PR, is using uv tool install <whatever> for a tool that doesn't already have an environment for it. You'll see it fail to find the distributions that were just installed in the virtualenv created.

This used to be fine earlier since the logic didn't care at all about sys.path and the interpreter's prefix was being updated which was effectively "updating" the site-packages path that'd be used for the lookups. Ignoring to update sys.path is not viable, at least, not with the goal of this change of using appropriately using sys.path for package discovery.

Still thinking through what to do here... The "slow" option here would be to recreate the Interpreter object with the interpreter query logic again (caching that as appropriate). However, the only attribute that's incorrect in a manner that matters semantically today is sys.path and all the other attributes could be inferred from the base Python. Plus, this approach is also gonna make nearly all locations where uv works with a virtualenv it just created meaningfully slower by requiring it to wait on the Python interpreter query logic ~every time (it's very unlikely that the user will have the newly created interpreters' information cached).

@pradyunsg
Copy link
Author

Ok, an alternative that I haven't fully thought through the implications of... I think this is more of a reasonable guess of what sys.path will be, but it might just work. Here's the idea:

Instead of querying the interpreter again, use the base_interpreter.sys_path_after_site_import + [virtualenv.site_packages] when system-site-packages is true / use base_interpreter.sys_path_before_site_import + [virtualenv.site_packages] when system-site-packages is false. This, of course, relies on the fact that python lets us disable site on startup & uv has control over the complete lifetime/invocation for its interpreter introspection logic.

/opt/pradyunsg/bin/python3.12 -S -c "import sys; print(sys.path)"                                                
['', '/opt/pradyunsg/lib/python312.zip', '/opt/pradyunsg/lib/python3.12', '/opt/pradyunsg/lib/python3.12/lib-dynload']/opt/pradyunsg/bin/python3.12 -c "import sys; print(sys.path)" 
['', '/opt/pradyunsg/lib/python312.zip', '/opt/pradyunsg/lib/python3.12', '/opt/pradyunsg/lib/python3.12/lib-dynload', '/opt/pradyunsg/lib/python3.12/site-packages']

Let me know if this seems reasonable, and also... I'd be very happy to hear if someone can think of a scenario where this might be broken!

@paveldikov
Copy link
Contributor

paveldikov commented Dec 15, 2024

I'd be very happy to hear if someone can think of a scenario where this might be broken!

Broken might be a bit harsh, but 'incomplete'?

  • this will indeed work for system-site-packages, since the paths all come from the base interpreter anyway
  • but I don't think it will work for the broader pattern of modifying the module search path via sitecustomize.py/.pth files?
    • This is commonly used to implement 'layered'/'stacked' venvs (see comment)

I do note that the former usage is much more common than the latter.

If the 'correct' thing to do is unacceptably slow -- and the guess is good enough for most use cases -- maybe an acceptable compromise is to do the 'good enough' thing by default and provide an option for those who really need that extra correctness?

@pradyunsg
Copy link
Author

pradyunsg commented Dec 16, 2024

sitecustomize.py/.pth files

You won't have these in an environment that was just-created in-process by uv, as I understand it. You could have these in the base environment though, so the "guess" model doesn't work in general for environments with include-system-site-packages = true.

I think that's fine. We could stick with this computational approach for environments with include-system-site-packages = false, since that's likely the 90% case and fall back to doing the subprocess for environments with include-system-site-packages = true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants