-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Too long cache filenames #2410
Comments
Oh wow. But that actually is the filename as on PyPI, right? |
Certainly: https://pypi.org/project/SQLAlchemy/1.4.52/#files I think best if you trim+ add hash if the file name exceeds a certain length. That would be probably best solution (it would be a pity to loose the name from cache and replace it with a meaningless hash - though many similar solutions do it exactly this way. |
Yeah, we just need some kind of deterministic encoding since it's content-addressed. (I.e., when we have that wheel name from PyPI, we need to know where to look in the cache.) Definitely doable! |
It seems like 255 is the common limit on Linux, but that it can be as low as 144 if you have encryption enabled: https://stackoverflow.com/questions/6114301/git-checkout-index-unable-to-create-file-file-name-too-long/6114588#6114588 |
I'm curious what pip does if you try to download that wheel :) |
Yep. My homedir where I hit it (my linux workstation with Debian Mint) is indeed encrypted. And the reason it is 143 (seems) is explained here https://wiki.archlinux.org/title/ECryptfs#Deficiencies As of newer version of pip (v23.3+) ![]() |
Yeah, there's no inherent reason that we need to use full-fidelity filenames here. It's just helpful for debugging. |
Suggestion: Only use hash if you've hit too long name issue. |
@charliermarsh have we addressed this with recent changes to the cache? |
No, hasn’t changed yet |
Not trivial right now because we read the tags from the wheel name in the built wheel cache |
This can be a real pain, on my machine the uv default cache dir is: I'm currently struggling to use UV on CI with jenkins due to that:
I'm not sure how it got that long, I imagine it has to do with uv caching + setuptools building. |
For windows, there might be another solution, I just came across this https://github.com/BurntSushi/ripgrep/blob/master/pkg/windows/README.md, which talks about using a manifest to declare that the application works with paths longer than 260 characters (I suppose the OS libraries change their behaviour then). |
Yeah. Note though that, AIUI, the manifest is only 1 of 2 things that are required. The user also needs to apply a registry edit: https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry#enable-long-paths-in-windows-10-version-1607-and-later |
@potiuk Did you manage to use sqlalchemy with uv? I just tried edit: caption obvious: |
Hello @charliermarsh @zanieb - would it be possible to prioritise that one a bit now? To add a bit of context - we are now discussing Airlfow 3 approach of managing our complex dependencies and providers, and one of the options considered is to use Now - this one makes my life really hard ... Either I have to remove cache or move the cache somewhere aside of my home directory to another filesystem (but then hard-links don't work) etc. etc. Also for me - this is quite a blocker to propose it - with our number of contributors (we just passed 3000!) - surely there will be some like me who have they home dir encrypted - especially that it's an option in multiple distros that you can select (For me - that came out-of-the box with my Linux mint installation and as a security freak I obviously enabled it). Maybe a good solution will be to check if the filesystem is encrypted or simply react to "too long filename" and use a hash instead - there is no need to change it in bulk - simply handling the exception when it occurs should be good-enough and have very little effect on debuggability. Looking forward to having this one solved (I'd do it myself - but I have far too many things on my plate now to learn rust and all the internals of |
Yeah we def need to fix this. |
Hi everyone, the issue with long cache names also displays some cryptic error messages such as
If you shorten the the cache dir to e.g. There is sometimes a second very long error message such as These messages are all irrelevant IMO as they just show where in time UV crashed due to a long file name, but the solution will to shorten the path somehow. |
Just a small ping on that one.. We already started to move to uv for workspaces, and I got my Linux PC with encryption back from cooling system replacement. Would be nice to start using it with |
FYI. We've switched Airflow (and I also did it) to use I have also not heard anyone else having similar issue. Not sure what caused it though. W have not changed our requirements for sqlalchemy, but maybe either the algorithm that decides which packages to install or other "generic" error handling in From my perspective - this issue can be closed, I think I'd have seen it again with the heavy usage of |
There were some recent but minor improvements on some file names (according to the changelog on 0.5.x line), but for us, on Windows we still get inexplicable issues that goes away by changing the cache path via pyproject.toml that cause other issues still. And the issue with Windows docker containers that get an even smaller limit for file names. We were able to partially overcome that on our specific user case, but the whole thing is too buggy for my taste. Please keep the issue open. You may unsubscribe from it if you like |
Thanks @potiuk. I'll leave it open but I don't plan to prioritize any work here until we have clear reproductions and evidence of significant need. |
I can be still subscribed :) no worries. I have no plans from my side to close it - not my decision, I think it's far more important to see if there is - like @charliermarsh mentioned - more users affected and clear reproduction/evidence that it disrupts more users. From Airflow at least (and our 700+deps + the BTW. I'd personally agree with @charliermarsh - and even more - qualify that issue as a "fix it if you need" kind of thing. Since this seems like a very low/rare impact issue - I'd say to anyone who wants to get it fixed - "Yeah we know it, and we know how loe impact it has. If you really want to have it solved, feel free to provide a complete PR that we can review and merge, that will not disrupt the But that's just me - with side comment and assessment. It's just the question of what else (more important) will not be done if that one is prioritized (i.e. "missed opportunities"). |
@charliermarsh What else we need for reproduction?
Do you have any idea what else I can investigate to help there? |
I don't think we need anything else for a reproduction. We just need to find time to prioritize this. |
I don't have much idea of changes required right now, but I am planning to start working on the implementation. I am planning to use msgpack file for storing tags, maybe the full filename itself. I am currently going with
|
Sorry - pressed enter too fast.
Some combinations of platform / packages might generate too long cache file names. Tested with
0.1.18
on my mint Linux(reproducible with
main
airflow`).The text was updated successfully, but these errors were encountered: