-
-
Notifications
You must be signed in to change notification settings - Fork 907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more checks for the validity of refnames #1672
Merged
Byron
merged 1 commit into
gitpython-developers:main
from
trail-of-forks:robust-refname-checks
Sep 22, 2023
+84
−2
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to avoid converting to
str
? I assume this tries to decoderef_path
with the current string encoding, which changes depending on the interpreter or user configuration and generally causes a lot of trouble.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless this PR worsens that problem in some way, which I believe it does not, I would recommend it be fixed separately and later. The code this is replacing already had:
GitPython/git/refs/symbolic.py
Lines 171 to 172 in d40320b
But actually even that neither introduced nor exacerbated the problem. From the commit prior to #1644 being merged:
GitPython/git/refs/symbolic.py
Lines 164 to 174 in 830025b
Note how
str(ref_path)
was passed toos.path.join
, which when givenstr
s returns astr
, thus astr
was being passed toopen
. Note also that, while thisstr
call was actually redundant (os.path.join
accepts path-like objects since Python 3.6), even it was not the cause ofstr
and notbytes
being used. The annotation onref_path
isUnion[PathLike, None]
, wherePathLike
is:GitPython/git/types.py
Line 43 in 830025b
Where both alternatives--
str
andos.PathLike[str]
--represent text that has already been decoded.So unless I'm missing something--which I admit I could be--I don't think it makes conceptual sense to do anything about that in this pull request. Furthermore, unless the judgment that CVE-2023-41040 was a security vulnerability was mistaken, or something about the variation explicated in #1644 (comment) is less exploitable, it seems to me that this pull request is fixing a vulnerability. Assuming that is the case, then I think this should avoid attempting to make far-reaching changes beyond those that pertain to the vulnerability, and that although reviewing these changes for correctness should not be rushed, other kinds of delays should be mostly avoided. With good regression tests included, as seems to be the case, the code could be improved on later in other ways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the thorough assessment, I wholeheartedly agree.
The 'how to handle paths correctly' issue is definitely one of the big breaking points in GitPython, but maybe, for other reasons, this wasn't ever a problem here.
Knowing this is on your radar, maybe one day there will be a solution to it.
gitoxide
already solves this problem, but it's easier when you have an actual type system and a standard library that makes you aware every step of the way.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the ultimate goal to support both
str
-based andbytes
-based ref names and paths?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal is correctness, and it's vital that one doesn't try to decode paths to fit some interpreter-controlled encoding. Paths are paths, and if you are lucky, they can be turned into bytes. On Unix, that's always possible and a no-op, but on windows it may require a conversion. It's just the question how these things are supposed to work in python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this relate (conceptually, I mean) to the issue in rust-lang/rust#12056?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A great find :) - yes, that's absolutely related.
gitoxide
internally handles git-paths as bundles of bytes without known encoding, and just likegit
, it assumes at least ASCII. Conversions do happen but they all go throughgix-path
to have a central place for it.Doing something like it would be needed here as well, even though I argue that before that happens universally, there should be some clear definition of what GitPython is supposed to be.
When I took it over by contributing massively, just like you do now, I needed more control for the use-case I had in mind, and started implementing all these sloppy pure-python components that don't even get the basics right. With that I turned GitPython into some strange hybrid which I think didn't do it any good besides maybe being a little faster for some usecases. After all, manipulating an index in memory has advantages, but there are also other ways to do it while relying on
git
entirely.Maybe this is thinking a step too far, but I strongly believe that the true benefit of GitPython is to be able to call
git
in a simple manner and to be compliant naturally due to usinggit
directly. This should be its identity.But then again, it's worth recognizing that changing the various pure-python implementations to us
git
under the hood probably isn't possible in a non-breaking way.Another avenue would be to try and get the APIs to use types that don't suffer from encoding/decoding issues related to Paths, and then one day make the jump to replacing the pure-python implementations with the python bindings of
gitoxide
.