Improve normalization of `VendoredPath`s #11991

AlexWaygood · 2024-06-23T18:21:34Z

Summary

This is a competing PR to #11989. Instead of making VendoredPath normalization eager, it makes the lazy normalization that we currently do more principled:

the normalization function checks for .. path components that attempt to "escape" from the zip archive altogether (e.g. VendoredPath("..") should be rejected)
the normalization function returns a Result instead of panicking if the VendoredPath is invalid.

Test Plan

cargo test -p ruff_db

codspeed-hq · 2024-06-23T18:26:32Z

CodSpeed Performance Report

Merging #11991 will improve performances by 4.89%

_{Comparing vendored-parts-parent-part (5a18ac8) with main (068b75c)}

Summary

⚡ 1 improvements
✅ 29 untouched benchmarks

Benchmarks breakdown

	Benchmark	`main`	`vendored-parts-parent-part`	Change
⚡	`linter/default-rules[pydantic/types.py]`	1.9 ms	1.8 ms	+4.89%

github-actions · 2024-06-23T18:41:18Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

MichaReiser

I'm not sure if the many extra error types are worth the complexity, especially considering that vendored paths isn't something that comes from externally.

I generally like narrow error types, but they can be challenging to work with. metadata and read now return two different, disjoint errors. This can be frustrating because a function that calls both read and metadata now must define its own error type (or use anyhow) to propagate both errors, but that's as good as just using io::Error.

The other question is. Is it useful to know that reading a file failed because the path was invalid or because there's an other IO error? Would a caller handle these two cases differently? I would say probably not? I would probably call unwrap for invalid paths because I know they are well formed (we control the construction) and there's nothing I can do about IO errors other than propagating them.

I'm leaning towards returning NotFound if a path has too many ../ components. Prefixes are a bit more awkward. I'm leaning toward just panicking, because this is a dev error.

The alternative is to handle prefixes similar to root where you take the starting prefix (win only) and concatenate it with the next component to get "UNIX" like semantics.

MichaReiser · 2024-06-23T18:57:06Z

crates/ruff_db/src/vendored.rs

+#[derive(Debug, thiserror::Error)]
+pub enum MetadataLookupError {
+    #[error("{0:?} does not exist in the zip archive")]
+    NonExistentPath(VendoredPathBuf),
+    #[error("{0}")]
+    InvalidPath(#[from] InvalidVendoredPathError),
+}


Dealing with different errors can often times be more difficult than just having one error, even if it contains more variant than are actually possible.

For example, someone calling metdata and read now needs to handle both MetadataLookupError and ZipFileReadError but there's no way to convert one error to the other. It forces the caller to define its own error type that is a union over the two.

MichaReiser · 2024-06-23T19:01:18Z

crates/ruff_db/src/vendored.rs

+                if normalized_parts.pop().is_none() {
+                    return Err(InvalidVendoredPathError::EscapeFromZipFile);
+                }


I quickly checked what the normal FS operations return when you navigate outside the cwd. They just return a NotFound error, which is probably also enough for us.

MichaReiser · 2024-06-23T19:05:23Z

crates/ruff_db/src/vendored.rs

+            unsupported => {
+                return Err(InvalidVendoredPathError::UnsupportedComponent(
+                    unsupported.to_string(),
+                ))


Am I correct that this is only to handle prefixes? I guess the problem we run into here is that camino::Utf8Path path parsing is platform-dependent. I'm not sure how to solve this best but I'm also not sure if this special case is worth returning an Error (we control the path creation, it's not that they're provided from externally)

Or a camino::Utf8Component::RootDir... which I don't think is possible in the middle of a file, but that can't be validated using the type system

MichaReiser · 2024-06-23T19:26:37Z

TIL: This is what std::path::absolute does

if path.as_os_str().is_empty() {
        Err(io::const_io_error!(io::ErrorKind::InvalidInput, "cannot make an empty path absolute",))
    }

Uh, this is funny. Too many ../ hasn't any meaning. It just means the path starts at /.

assert_eq!(
            std::path::Path::new("a"),
            std::path::Path::new(
                "../../../../../../../../../../../../../../../../../../../../../../home"
            )
            .canonicalize()
            .unwrap()
        );

Fails, and the path resolves to "/System/Volumes/Data/home". I can add as many ../ as I want and the path keeps resolving to /home. So I think you're implementation was actually correct.

Taking all this into consideration. I would return io::ErrorKind::InvalidInput when seeing a Prefix component

AlexWaygood · 2024-06-23T20:31:05Z

Uh, this is funny. Too many ../ hasn't any meaning. It just means the path starts at /.

Huh. Right, yeah, that makes sense. We're not trying to "escape from the zip archive"; we've just reached the root of the zip archive, and we can't go any further.

AlexWaygood · 2024-06-23T20:39:26Z

I think the conclusion here is that none of the changes here are needed:

Panicking is the correct thing to do if we encounter prefixes in the normalization routine. We control the inputs, so that just indicates a dev error on our part if we come across them, and the correct response to that is to panic rather than return an error
.. parts should be treated the same way as in OS paths, and for OS paths Rust allows you to apply an arbitrary number and doesn't error out (they're just treated as no-ops if you hit the root dir)

Thanks for talking this through with me, I appreciate it!

Improve normalization of VendoredPaths

5a18ac8

AlexWaygood requested a review from MichaReiser June 23, 2024 18:21

AlexWaygood mentioned this pull request Jun 23, 2024

[red-knot] Eagerly normalize VendoredPathBufs #11989

Closed

MichaReiser reviewed Jun 23, 2024

View reviewed changes

AlexWaygood closed this Jun 23, 2024

AlexWaygood deleted the vendored-parts-parent-part branch June 23, 2024 20:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve normalization of `VendoredPath`s #11991

Improve normalization of `VendoredPath`s #11991

AlexWaygood commented Jun 23, 2024

codspeed-hq bot commented Jun 23, 2024

github-actions bot commented Jun 23, 2024

MichaReiser left a comment •

edited

Loading

MichaReiser Jun 23, 2024

MichaReiser Jun 23, 2024

MichaReiser Jun 23, 2024

AlexWaygood Jun 23, 2024

MichaReiser commented Jun 23, 2024 •

edited

Loading

AlexWaygood commented Jun 23, 2024

AlexWaygood commented Jun 23, 2024

Improve normalization of VendoredPaths #11991

Improve normalization of VendoredPaths #11991

Conversation

AlexWaygood commented Jun 23, 2024

Summary

Test Plan

codspeed-hq bot commented Jun 23, 2024

CodSpeed Performance Report

Merging #11991 will improve performances by 4.89%

Summary

Benchmarks breakdown

github-actions bot commented Jun 23, 2024

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

MichaReiser left a comment • edited Loading

Choose a reason for hiding this comment

MichaReiser Jun 23, 2024

Choose a reason for hiding this comment

MichaReiser Jun 23, 2024

Choose a reason for hiding this comment

MichaReiser Jun 23, 2024

Choose a reason for hiding this comment

AlexWaygood Jun 23, 2024

Choose a reason for hiding this comment

MichaReiser commented Jun 23, 2024 • edited Loading

AlexWaygood commented Jun 23, 2024

AlexWaygood commented Jun 23, 2024

Improve normalization of `VendoredPath`s #11991

Improve normalization of `VendoredPath`s #11991

`ruff-ecosystem` results

MichaReiser left a comment •

edited

Loading

MichaReiser commented Jun 23, 2024 •

edited

Loading