Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use hash instead of full wheel name in wheels bucket #11738

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

nkitsaini
Copy link

@nkitsaini nkitsaini commented Feb 24, 2025

Summary

Closes #2410

This changes the name of files in wheels bucket to use a hash instead of the wheel name as to not exceed maximum file length limit on various systems.

This only addresses the primary concern of #2410. It still does not address:

  • Path limit of 260 on windows: Too long cache filenames #2410 (comment)
    To solve this we need to opt-in to longer path limits on windows (ref), but I think that is a separate issue and should be a separate MR.
  • Exceeding filename limit while building a wheel from source distribution
    As per my understanding, this is out of uv's control. Name of the output wheel will be decided by build-backend used by the project. For wheels built from source distribution, pip also uses the wheel names in cache. So I have not touched sdists cache.

I have added a filename: WheelFileName field in Archive, so we can use it while indexing instead of relying on the filename on disk. Another way to do this was to read .dist-info/WHEEL and .dist-info/METADATA and build WheelFileName but that seems less robust and will be slower.

Test Plan

Tested by installing yt-dlp, httpie and sqlalchemy and verifying that cache files in wheels bucket use hash.

@@ -29,8 +29,8 @@ pub(crate) struct BuiltWheelMetadata {
impl BuiltWheelMetadata {
/// Find a compatible wheel in the cache.
pub(crate) fn find_in_cache(tags: &Tags, cache_shard: &CacheShard) -> Option<Self> {
for directory in files(cache_shard) {
Copy link
Author

@nkitsaini nkitsaini Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unrelated change. I got a bit confused here between directory/files while reading the code. I can move this to separate MR or remove it if required.

@nkitsaini nkitsaini force-pushed the cache-url-length branch 2 times, most recently from d49319c to 217edc2 Compare February 24, 2025 08:11
@nkitsaini nkitsaini marked this pull request as ready for review February 24, 2025 08:38
@nkitsaini nkitsaini changed the title Use hash instead of full wheel name in wheels cache Use hash instead of full wheel name in wheels bucket Feb 24, 2025
@charliermarsh
Copy link
Member

Thanks! Will review.

@charliermarsh charliermarsh added bug Something isn't working no-build Disable building binaries in CI labels Feb 25, 2025
@charliermarsh
Copy link
Member

I think this generally looks right, though I'm undecided on whether we should always do this, or only do it for wheels with filenames that exceed a certain length. It does hurt cache (plaintext) readability a bit which is inconvenient for debugging, since you can no longer infer the wheel tags etc. from the cached filename alone.

@charliermarsh
Copy link
Member

At least the hashes are consistent across these files, so you can look at WHEEL:

Screenshot 2025-02-25 at 9 55 25 AM

That's probably good enough.

@nkitsaini
Copy link
Author

so you can look at WHEEL:

I was doing a more hacky cat <file> and finding the tags/version.
image

Let me change the filenames to {trimmed_filename}-{hash}. Should be quick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working no-build Disable building binaries in CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Too long cache filenames
2 participants