-
Notifications
You must be signed in to change notification settings - Fork 21
discover git apply -pN
#240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new helper function, discover_patch_p, to automatically determine the correct -p value for applying git patches. This is a useful addition that improves the robustness of the GitPatchApplyTool. My review focuses on the implementation of discover_patch_p, where I've identified a significant opportunity for improvement in terms of robustness and correctness by using a more direct git command. The current implementation relies on parsing the output of git apply --stat, which can be fragile. I've suggested an alternative approach that should be more reliable.
| async def discover_patch_p(patch_file_path: AbsolutePath, repository_path: AbsolutePath) -> int: | ||
| """ | ||
| Process the given patch file and figure out with which `-p` value the patch should be applied | ||
| in the given repository. | ||
| Using `git apply --stat` we parse the given patch and try to fit it into the given repository. | ||
| """ | ||
| cmd = ["git", "apply", "--stat", str(patch_file_path)] | ||
| exit_code, stdout, stderr = await run_subprocess(cmd, cwd=repository_path) | ||
| if exit_code != 0: | ||
| # this means the patch is borked | ||
| raise ToolError(f"Command git-apply --stat failed: {stderr}") | ||
| # expat/lib/xmlparse.c | 8 - | ||
| # .github/workflows/scripts/mass-cppcheck.sh | 1 | ||
| # .github/workflows/data/exported-symbols.txt | 2 | ||
| # expat/lib/expat.h | 15 + | ||
| lines = stdout.splitlines() | ||
| files = [line.split("|")[0].strip() for line in lines if "|" in line] | ||
|
|
||
| # 0 should be impossible, git-apply hates it: | ||
| # "git diff header lacks filename information when removing 1 leading pathname component (line 5)" | ||
| # but how about /usr/bin/patch? can it handle "0"? | ||
| for n in range(1, 4): # I truly hope 3 is impossible | ||
| split_this_many = n - 1 | ||
| for fi in files: | ||
| stripped_fi = fi | ||
| if split_this_many > 0: | ||
| stripped_fi = fi.split("/", split_this_many)[-1] | ||
| if (repository_path / stripped_fi).exists(): | ||
| # I know this is naive, but we certainly cannot check all files | ||
| # because some may be missing in the checkout | ||
| return n | ||
| raise ToolError(f"Failed to discover the value for `-p` for patch file: {patch_file_path}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation of discover_patch_p is quite complex and relies on parsing the output of git apply --stat. This approach has a few drawbacks:
- Fragility: It depends on the output format of
git apply --stat, which is not guaranteed to be stable across git versions. It also seems to rely on an undocumented behavior wheregit apply --statacts like it has-p1applied. - Incorrectness: The logic for stripping path components is flawed and can lead to incorrect results for file paths that have fewer components than the number being stripped. More importantly, the current logic does not correctly handle patches that require
-p0. - Complexity: The logic of iterating, stripping paths, and checking for existence is complex and hard to follow.
A much simpler and more robust approach is to use git apply --check -p<n>. This command is designed for this exact purpose: to check if a patch can be applied with a given -p value without actually modifying any files. It's more direct, less fragile, and handles all cases correctly.
I suggest replacing the current implementation with one that uses this command.
| async def discover_patch_p(patch_file_path: AbsolutePath, repository_path: AbsolutePath) -> int: | |
| """ | |
| Process the given patch file and figure out with which `-p` value the patch should be applied | |
| in the given repository. | |
| Using `git apply --stat` we parse the given patch and try to fit it into the given repository. | |
| """ | |
| cmd = ["git", "apply", "--stat", str(patch_file_path)] | |
| exit_code, stdout, stderr = await run_subprocess(cmd, cwd=repository_path) | |
| if exit_code != 0: | |
| # this means the patch is borked | |
| raise ToolError(f"Command git-apply --stat failed: {stderr}") | |
| # expat/lib/xmlparse.c | 8 - | |
| # .github/workflows/scripts/mass-cppcheck.sh | 1 | |
| # .github/workflows/data/exported-symbols.txt | 2 | |
| # expat/lib/expat.h | 15 + | |
| lines = stdout.splitlines() | |
| files = [line.split("|")[0].strip() for line in lines if "|" in line] | |
| # 0 should be impossible, git-apply hates it: | |
| # "git diff header lacks filename information when removing 1 leading pathname component (line 5)" | |
| # but how about /usr/bin/patch? can it handle "0"? | |
| for n in range(1, 4): # I truly hope 3 is impossible | |
| split_this_many = n - 1 | |
| for fi in files: | |
| stripped_fi = fi | |
| if split_this_many > 0: | |
| stripped_fi = fi.split("/", split_this_many)[-1] | |
| if (repository_path / stripped_fi).exists(): | |
| # I know this is naive, but we certainly cannot check all files | |
| # because some may be missing in the checkout | |
| return n | |
| raise ToolError(f"Failed to discover the value for `-p` for patch file: {patch_file_path}") | |
| async def discover_patch_p(patch_file_path: AbsolutePath, repository_path: AbsolutePath) -> int: | |
| """ | |
| Process the given patch file and figure out with which `-p` value the patch should be applied | |
| in the given repository. | |
| It tries to apply the patch with different `-p` values using `git apply --check` | |
| and returns the first one that works. | |
| """ | |
| # We check p=1 first as it's the default for `git am`. Then 0, then 2 and 3. | |
| for p in [1, 0, 2, 3]: | |
| cmd = ["git", "apply", "--check", f"-p{p}", str(patch_file_path)] | |
| exit_code, _, _ = await run_subprocess(cmd, cwd=repository_path) | |
| if exit_code == 0: | |
| return p | |
| raise ToolError(f"Failed to discover the value for `-p` for patch file: {patch_file_path}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except that your implementation fails miserably when the patch introduces conflicts
agents/tools/wicked_git.py
Outdated
|
|
||
| # 0 should be impossible, git-apply hates it: | ||
| # "git diff header lacks filename information when removing 1 leading pathname component (line 5)" | ||
| # but how about /usr/bin/patch? can it handle "0"? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it can, it's been a while since I did that though 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, I never did -p0 myself and Gemini suggests it's possible to do as well:
Incorrectness: The logic for stripping path components is flawed and can lead to incorrect results for file paths that have fewer components than the number being stripped. More importantly, the current logic does not correctly handle patches that require -p0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this much more than the previous implementation 🙂
this implementation prints all files that a patch changes with `git apply --stat` and if at least one file from the patch is present in the repo, the function yields the respective N Signed-off-by: Tomas Tomecek <[email protected]> Assisted-by: Cursor(Claude)
c904678 to
ae9e78b
Compare
this implementation prints all files that a patch changes with
git apply --statand if at least one file from the patch is present in therepo, the function yields the respective N