Skip to content

Conversation

@TomasTomecek
Copy link
Member

this implementation prints all files that a patch changes with git apply --stat and if at least one file from the patch is present in the
repo, the function yields the respective N

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new helper function, discover_patch_p, to automatically determine the correct -p value for applying git patches. This is a useful addition that improves the robustness of the GitPatchApplyTool. My review focuses on the implementation of discover_patch_p, where I've identified a significant opportunity for improvement in terms of robustness and correctness by using a more direct git command. The current implementation relies on parsing the output of git apply --stat, which can be fragile. I've suggested an alternative approach that should be more reliable.

Comment on lines 109 to 142
async def discover_patch_p(patch_file_path: AbsolutePath, repository_path: AbsolutePath) -> int:
"""
Process the given patch file and figure out with which `-p` value the patch should be applied
in the given repository.
Using `git apply --stat` we parse the given patch and try to fit it into the given repository.
"""
cmd = ["git", "apply", "--stat", str(patch_file_path)]
exit_code, stdout, stderr = await run_subprocess(cmd, cwd=repository_path)
if exit_code != 0:
# this means the patch is borked
raise ToolError(f"Command git-apply --stat failed: {stderr}")
# expat/lib/xmlparse.c | 8 -
# .github/workflows/scripts/mass-cppcheck.sh | 1
# .github/workflows/data/exported-symbols.txt | 2
# expat/lib/expat.h | 15 +
lines = stdout.splitlines()
files = [line.split("|")[0].strip() for line in lines if "|" in line]

# 0 should be impossible, git-apply hates it:
# "git diff header lacks filename information when removing 1 leading pathname component (line 5)"
# but how about /usr/bin/patch? can it handle "0"?
for n in range(1, 4): # I truly hope 3 is impossible
split_this_many = n - 1
for fi in files:
stripped_fi = fi
if split_this_many > 0:
stripped_fi = fi.split("/", split_this_many)[-1]
if (repository_path / stripped_fi).exists():
# I know this is naive, but we certainly cannot check all files
# because some may be missing in the checkout
return n
raise ToolError(f"Failed to discover the value for `-p` for patch file: {patch_file_path}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of discover_patch_p is quite complex and relies on parsing the output of git apply --stat. This approach has a few drawbacks:

  • Fragility: It depends on the output format of git apply --stat, which is not guaranteed to be stable across git versions. It also seems to rely on an undocumented behavior where git apply --stat acts like it has -p1 applied.
  • Incorrectness: The logic for stripping path components is flawed and can lead to incorrect results for file paths that have fewer components than the number being stripped. More importantly, the current logic does not correctly handle patches that require -p0.
  • Complexity: The logic of iterating, stripping paths, and checking for existence is complex and hard to follow.

A much simpler and more robust approach is to use git apply --check -p<n>. This command is designed for this exact purpose: to check if a patch can be applied with a given -p value without actually modifying any files. It's more direct, less fragile, and handles all cases correctly.

I suggest replacing the current implementation with one that uses this command.

Suggested change
async def discover_patch_p(patch_file_path: AbsolutePath, repository_path: AbsolutePath) -> int:
"""
Process the given patch file and figure out with which `-p` value the patch should be applied
in the given repository.
Using `git apply --stat` we parse the given patch and try to fit it into the given repository.
"""
cmd = ["git", "apply", "--stat", str(patch_file_path)]
exit_code, stdout, stderr = await run_subprocess(cmd, cwd=repository_path)
if exit_code != 0:
# this means the patch is borked
raise ToolError(f"Command git-apply --stat failed: {stderr}")
# expat/lib/xmlparse.c | 8 -
# .github/workflows/scripts/mass-cppcheck.sh | 1
# .github/workflows/data/exported-symbols.txt | 2
# expat/lib/expat.h | 15 +
lines = stdout.splitlines()
files = [line.split("|")[0].strip() for line in lines if "|" in line]
# 0 should be impossible, git-apply hates it:
# "git diff header lacks filename information when removing 1 leading pathname component (line 5)"
# but how about /usr/bin/patch? can it handle "0"?
for n in range(1, 4): # I truly hope 3 is impossible
split_this_many = n - 1
for fi in files:
stripped_fi = fi
if split_this_many > 0:
stripped_fi = fi.split("/", split_this_many)[-1]
if (repository_path / stripped_fi).exists():
# I know this is naive, but we certainly cannot check all files
# because some may be missing in the checkout
return n
raise ToolError(f"Failed to discover the value for `-p` for patch file: {patch_file_path}")
async def discover_patch_p(patch_file_path: AbsolutePath, repository_path: AbsolutePath) -> int:
"""
Process the given patch file and figure out with which `-p` value the patch should be applied
in the given repository.
It tries to apply the patch with different `-p` values using `git apply --check`
and returns the first one that works.
"""
# We check p=1 first as it's the default for `git am`. Then 0, then 2 and 3.
for p in [1, 0, 2, 3]:
cmd = ["git", "apply", "--check", f"-p{p}", str(patch_file_path)]
exit_code, _, _ = await run_subprocess(cmd, cwd=repository_path)
if exit_code == 0:
return p
raise ToolError(f"Failed to discover the value for `-p` for patch file: {patch_file_path}")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except that your implementation fails miserably when the patch introduces conflicts


# 0 should be impossible, git-apply hates it:
# "git diff header lacks filename information when removing 1 leading pathname component (line 5)"
# but how about /usr/bin/patch? can it handle "0"?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it can, it's been a while since I did that though 🙂

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, I never did -p0 myself and Gemini suggests it's possible to do as well:

Incorrectness: The logic for stripping path components is flawed and can lead to incorrect results for file paths that have fewer components than the number being stripped. More importantly, the current logic does not correctly handle patches that require -p0.

Copy link
Member

@nforro nforro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this much more than the previous implementation 🙂

this implementation prints all files that a patch changes with `git
apply --stat` and if at least one file from the patch is present in the
repo, the function yields the respective N

Signed-off-by: Tomas Tomecek <[email protected]>
Assisted-by: Cursor(Claude)
@TomasTomecek TomasTomecek merged commit ea31ee3 into packit:main Oct 6, 2025
6 checks passed
@TomasTomecek TomasTomecek deleted the detect-apply-p-n branch October 6, 2025 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants