Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve normalization of VendoredPaths #11991

Closed
wants to merge 1 commit into from

Conversation

AlexWaygood
Copy link
Member

Summary

This is a competing PR to #11989. Instead of making VendoredPath normalization eager, it makes the lazy normalization that we currently do more principled:

  • the normalization function checks for .. path components that attempt to "escape" from the zip archive altogether (e.g. VendoredPath("..") should be rejected)
  • the normalization function returns a Result instead of panicking if the VendoredPath is invalid.

Test Plan

cargo test -p ruff_db

Copy link

codspeed-hq bot commented Jun 23, 2024

CodSpeed Performance Report

Merging #11991 will improve performances by 4.89%

Comparing vendored-parts-parent-part (5a18ac8) with main (068b75c)

Summary

⚡ 1 improvements
✅ 29 untouched benchmarks

Benchmarks breakdown

Benchmark main vendored-parts-parent-part Change
linter/default-rules[pydantic/types.py] 1.9 ms 1.8 ms +4.89%

Copy link
Contributor

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if the many extra error types are worth the complexity, especially considering that vendored paths isn't something that comes from externally.

I generally like narrow error types, but they can be challenging to work with. metadata and read now return two different, disjoint errors. This can be frustrating because a function that calls both read and metadata now must define its own error type (or use anyhow) to propagate both errors, but that's as good as just using io::Error.

The other question is. Is it useful to know that reading a file failed because the path was invalid or because there's an other IO error? Would a caller handle these two cases differently? I would say probably not? I would probably call unwrap for invalid paths because I know they are well formed (we control the construction) and there's nothing I can do about IO errors other than propagating them.

I'm leaning towards returning NotFound if a path has too many ../ components. Prefixes are a bit more awkward. I'm leaning toward just panicking, because this is a dev error.

The alternative is to handle prefixes similar to root where you take the starting prefix (win only) and concatenate it with the next component to get "UNIX" like semantics.

Comment on lines +125 to +131
#[derive(Debug, thiserror::Error)]
pub enum MetadataLookupError {
#[error("{0:?} does not exist in the zip archive")]
NonExistentPath(VendoredPathBuf),
#[error("{0}")]
InvalidPath(#[from] InvalidVendoredPathError),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dealing with different errors can often times be more difficult than just having one error, even if it contains more variant than are actually possible.

For example, someone calling metdata and read now needs to handle both MetadataLookupError and ZipFileReadError but there's no way to convert one error to the other. It forces the caller to define its own error type that is a union over the two.

Comment on lines +302 to +304
if normalized_parts.pop().is_none() {
return Err(InvalidVendoredPathError::EscapeFromZipFile);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I quickly checked what the normal FS operations return when you navigate outside the cwd. They just return a NotFound error, which is probably also enough for us.

Comment on lines +306 to +309
unsupported => {
return Err(InvalidVendoredPathError::UnsupportedComponent(
unsupported.to_string(),
))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct that this is only to handle prefixes? I guess the problem we run into here is that camino::Utf8Path path parsing is platform-dependent. I'm not sure how to solve this best but I'm also not sure if this special case is worth returning an Error (we control the path creation, it's not that they're provided from externally)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or a camino::Utf8Component::RootDir... which I don't think is possible in the middle of a file, but that can't be validated using the type system

@MichaReiser
Copy link
Member

MichaReiser commented Jun 23, 2024

TIL: This is what std::path::absolute does

if path.as_os_str().is_empty() {
        Err(io::const_io_error!(io::ErrorKind::InvalidInput, "cannot make an empty path absolute",))
    }

Uh, this is funny. Too many ../ hasn't any meaning. It just means the path starts at /.

assert_eq!(
            std::path::Path::new("a"),
            std::path::Path::new(
                "../../../../../../../../../../../../../../../../../../../../../../home"
            )
            .canonicalize()
            .unwrap()
        );

Fails, and the path resolves to "/System/Volumes/Data/home". I can add as many ../ as I want and the path keeps resolving to /home. So I think you're implementation was actually correct.

Taking all this into consideration. I would return io::ErrorKind::InvalidInput when seeing a Prefix component

@AlexWaygood
Copy link
Member Author

Uh, this is funny. Too many ../ hasn't any meaning. It just means the path starts at /.

Huh. Right, yeah, that makes sense. We're not trying to "escape from the zip archive"; we've just reached the root of the zip archive, and we can't go any further.

@AlexWaygood
Copy link
Member Author

I think the conclusion here is that none of the changes here are needed:

  • Panicking is the correct thing to do if we encounter prefixes in the normalization routine. We control the inputs, so that just indicates a dev error on our part if we come across them, and the correct response to that is to panic rather than return an error
  • .. parts should be treated the same way as in OS paths, and for OS paths Rust allows you to apply an arbitrary number and doesn't error out (they're just treated as no-ops if you hit the root dir)

Thanks for talking this through with me, I appreciate it!

@AlexWaygood AlexWaygood deleted the vendored-parts-parent-part branch June 23, 2024 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants