Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Eagerly normalize VendoredPathBufs #11989

Closed
wants to merge 6 commits into from

Conversation

AlexWaygood
Copy link
Member

Summary

Currently VendoredPaths are only normalized when you actually try to use them to lookup a path in the vendored zip archive. This has a couple of disadvantages:

  • The path is only "dynamically" checked when you actually try to use it. If the path is invalid, we should probably validate it and normalize it at the point where it's constructed, so that it's impossible to create a VendoredPath(Buf) to begin with.
  • Every time you query whether a path exists (or query some other metadata about the path) in the zip archive, the normalization has to allocate a new String. But this allocation is hidden from the user of the VendoredFileSystem APIs, and feels somewhat wasteful if you're making several queries with the same path.

This PR changes the design of the VendoredFileSystem, VendoredPath and VendoredPathBuf so that normalization is done eagerly at the point when VendoredPath and VendoredPathBuf are created, rather than lazily when you try to use them to query paths in the zip archive. The main disadvantage of this is that it becomes a lot more annoying to construct a VendoredPath from an &str. Previously you could do VendoredPath::new("foo.pyi"); now you must do &VendoredPathBuf::try_from("foo.pyi").unwrap().

I'm not wedded to this PR as it does feel like it makes the API somewhat more awkward to use. But I said I'd look into this as a followup for #11863 (see e.g. #11863 (comment)). So this is the followup!

Test Plan

cargo test -p ruff_db

@AlexWaygood AlexWaygood added the red-knot Multi-file analysis & type inference label Jun 23, 2024
Copy link
Contributor

github-actions bot commented Jun 23, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

@MichaReiser
Copy link
Member

I think my preferred solution here would be to improve the normalization logic to avoid allocating if the path's already normalized. But I'm not opposed to do the normalization eagerly.

If we so the normalization eagerly, than I don't think the Path variant still makes sense, we should just use &PathBuf

What's unclear to me is how the normalization works when joining paths because we would then have to normalize both paths before we can join them (which includes an allocation)

@MichaReiser
Copy link
Member

Reading through this more, I'm leaning towards keeping it as is because it makes the API more cumbersome to use without very convincing benefits.

The path is only "dynamically" checked when you actually try to use it. If the path is invalid, we should probably validate it and normalize it at the point where it's constructed, s

It's unclear to me if we actually do want to do this. I don't think there's anything wrong with VendoredPath::new("root/sub").join("../other") where ../other would no longer be a valid VendoredPath (because it starts with a ../)

Every time you query whether a path exists (or query some other metadata about the path) in the zip archive, the normalization has to allocate a new String.

I agree that this is not ideal but I think we can instead change the normalization to return a Cow and only allocate if the path isn't normalized.

I also expect that this won't be a very hot path because module resolution is cached. So we only resolve every module once.

@AlexWaygood AlexWaygood force-pushed the alex/eagerly-normalize-vendored-paths branch from 38a84d2 to 0775bc7 Compare June 23, 2024 18:40
@AlexWaygood
Copy link
Member Author

Reading through this more, I'm leaning towards keeping it as is because it makes the API more cumbersome to use without very convincing benefits.

Yeah, I am also unsure. But I think part of the issue is that the current implementation on main isn't very principled: it panics if it encounters an unnormalized path, when it should probably return an error instead. #11991 is an alternative to this PR that tries to do the normalization in a more principled way, and it also makes the APIs more cumbersome to use.

It's unclear to me if we actually do want to do this. I don't think there's anything wrong with VendoredPath::new("root/sub").join("../other") where ../other would no longer be a valid VendoredPath (because it starts with a ../)

Hmm, that's an interesting point. We could possibly have something like a join_str() method instead, that allows you to join a fragment to this path even if the fragment itself is not a valid path. But I agree that that's not ideal...

Copy link

codspeed-hq bot commented Jun 23, 2024

CodSpeed Performance Report

Merging #11989 will improve performances by 4.92%

Comparing alex/eagerly-normalize-vendored-paths (4bd1fe8) with main (068b75c)

Summary

⚡ 1 improvements
✅ 29 untouched benchmarks

Benchmarks breakdown

Benchmark main alex/eagerly-normalize-vendored-paths Change
linter/default-rules[pydantic/types.py] 1.9 ms 1.8 ms +4.92%

@AlexWaygood AlexWaygood force-pushed the alex/eagerly-normalize-vendored-paths branch from 8d377c3 to 4bd1fe8 Compare June 23, 2024 21:04
@AlexWaygood
Copy link
Member Author

Leaving this for the time being

@AlexWaygood AlexWaygood deleted the alex/eagerly-normalize-vendored-paths branch June 28, 2024 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants