Skip to content

Conversation

@TomasTomecek
Copy link
Member

@TomasTomecek TomasTomecek commented Aug 29, 2025

Got a successful run (comment generated with gemini from the inline console string):

Status Details

Successfully backported three upstream commits to the git-lfs package for RHEL-122431.

Cherry-pick Workflow Used

  1. Cloned upstream repository from https://github.com/git-lfs/git-lfs.git
  2. Checked out base version v3.6.1 (commit ea47a34bde1ba1ecf9eb8dc6d44fcf7aef42f5e3)
  3. Applied existing downstream patch: 0001-Use-constant-format-string.patch
  4. Cherry-picked three commits with conflict resolution.

Commits Backported

Commit Hash Subject Conflicts Resolved?
0cffe93176b870055c9dadbb3cc9a4a440e98396 "check for dir/symlink conflicts on checkout/pull" Yes
d02bd13f02ef76f6807581cd6b34709069cb3615 Applied cleanly No
5c11ffce9a4f095ff356bc781e2a031abb46c1a8 "docs,lfs,t: create new files on checkout and pull" Yes

Conflict Resolution Details

Commit Files/Changes Involved
0cffe93176b870055c9dadbb3cc9a4a440e98396 * commands/command_checkout.go: Added path/filepath import and integrated directory/symlink conflict checking logic.
* commands/pull.go: Added directory walker checks before decoding pointers, integrated path creation logic.
* lfs/gitfilter_smudge.go: Changed from os.Stat to os.Lstat with IsRegular() check to handle symlinks properly.
* t/t-checkout.sh: Updated test path and added two new test cases.
* t/t-pull.sh: Updated test expectations and added two new test cases.
5c11ffce9a4f095ff356bc781e2a031abb46c1a8 * lfs/gitfilter_smudge.go: Re-added tools.MkdirAll call needed for directory creation.
* t/t-checkout.sh: Removed duplicate conflict markers.
* t/t-pull.sh: Removed duplicate conflict markers.

Patch Integration Summary

  • Generated combined patch file: RHEL-122431.patch
  • Added Patch1 tag to spec file with references to all three upstream commits.
  • Verified patch applies cleanly with centpkg prep (exit 0).
  • Successfully built SRPM: git-lfs-3.6.1-3.el9.src.rpm (3.5MB).

Conclusion: All conflicts were intelligently resolved by understanding the intent of each commit and adapting the changes to work with the git-lfs 3.6.1 codebase. The backport maintains the security and functionality improvements from upstream while preserving compatibility with the existing downstream patches.

Fixes https://github.com/packit/jotnar/issues/109

@TomasTomecek TomasTomecek marked this pull request as draft August 29, 2025 13:06
@TomasTomecek TomasTomecek changed the title triage agent can output multiple patches Our agents can handle multiple patches for a single Jira Aug 29, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly modifies the data models and agent prompts to support multiple patch URLs from the triage agent. However, there are a few issues to address. A critical bug exists in patch_validator.py where the input model is not updated to accept multiple URLs, which will lead to a runtime error. Additionally, the patch validation can be made more efficient by processing URLs concurrently. A more significant concern is that this change breaks the backport_agent, which has not been updated to handle the new list of patch URLs. This breaking change should be addressed, either by updating the consumer in this PR or by making the change backward-compatible.

content = partial_bytes.decode(response.get_encoding() or 'utf-8', errors='ignore')
content += f"\n\n[Content truncated - showing first {max_length} bytes of {response.content_length} total]"
results = []
for url in tool_input.urls:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The PatchValidatorInput model defines a single url field, but here you are trying to access tool_input.urls. This will raise an AttributeError at runtime. The PatchValidatorInput model needs to be updated to accept a list of URLs, for example: urls: list[str] = Field(...).

Comment on lines 69 to 115
results = []
for url in tool_input.urls:
url = url.strip()

is_issue, reason = self._is_issue_reference(url)

status_code = None
is_accessible = False
content = None

if not is_issue:
try:
timeout = aiohttp.ClientTimeout(total=30)
async with aiohttp.ClientSession(timeout=timeout) as session:
async with session.get(url) as response:
status_code = response.status
is_accessible = response.status < 400

if is_accessible:
max_length = MAX_CONTENT_LENGTH
if response.content_length and response.content_length > max_length:
# Avoid reading a huge response into memory.
partial_bytes = await response.content.read(max_length)
content = partial_bytes.decode(response.get_encoding() or 'utf-8', errors='ignore')
content += f"\n\n[Content truncated - showing first {max_length} bytes of {response.content_length} total]"
else:
# For responses without content-length or smaller ones, read fully and then truncate.
raw_content = await response.text()
content = self._truncate_content(raw_content)
reason = "URL is accessible and content fetched successfully"
else:
# For responses without content-length or smaller ones, read fully and then truncate.
raw_content = await response.text()
content = self._truncate_content(raw_content)
reason = "URL is accessible and content fetched successfully"
else:
reason = f"Not an issue reference but not accessible (HTTP {response.status})"

except asyncio.TimeoutError:
reason = "Not an issue reference but request timeout"
except Exception as e:
reason = f"Not an issue reference but raised exception: {str(e)}"

result = PatchValidatorResult(
is_accessible=is_accessible,
status_code=status_code,
content=content,
reason=reason,
)

return PatchValidatorOutput(result)
reason = f"Not an issue reference but not accessible (HTTP {response.status})"

except asyncio.TimeoutError:
reason = "Not an issue reference but request timeout"
except Exception as e:
reason = f"Not an issue reference but raised exception: {str(e)}"

results.append(PatchValidatorResult(
url=url,
is_accessible=is_accessible,
status_code=status_code,
content=content,
reason=reason,
))

return PatchValidatorOutput(results)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation processes URLs sequentially in a for loop. Since this is an async function, you can significantly improve performance by processing the URLs concurrently. Consider using asyncio.gather to run the network requests for all URLs in parallel. This will make the tool much faster when validating multiple patches.

@TomasTomecek
Copy link
Member Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to allow agents to handle multiple patches for a single Jira issue, which is a valuable feature. The changes involve updating BackportData to accept a list of patch_urls instead of a single URL and adjusting the PatchValidatorResult to include the URL it validated. While the changes are in the right direction, introducing a list of URLs is a breaking change. I've found that backport_agent.py has not been updated to handle this list, which will cause runtime failures. I've left a critical comment regarding this. I've also provided a couple of suggestions to improve the robustness and the formatting of the generated comments.

Comment on lines +133 to +136
patch_urls: list[str] = Field(
description="A list of URLs to the sources of the fixes that were validated using the PatchValidator tool")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This change from a single patch_url to a list patch_urls is a significant breaking change. The rest of the codebase does not seem to be updated to handle this, which will likely cause runtime errors.

For example, agents/backport_agent.py still references backport_data.patch_url on line 536, which no longer exists. The run_workflow function in that file is designed to handle only a single upstream_fix string. This PR should probably include the necessary changes in the agents to handle multiple patches, or this change should be behind a feature flag to avoid breaking the main branch.

Additionally, to improve robustness, you should ensure that patch_urls is never an empty list. You can enforce this with Pydantic's validation by adding min_length=1 to the Field definition.

Suggested change
patch_urls: list[str] = Field(
description="A list of URLs to the sources of the fixes that were validated using the PatchValidator tool")
patch_urls: list[str] = Field(
min_length=1,
description="A list of URLs to the sources of the fixes that were validated using the PatchValidator tool")

case BackportData():
fix_version_text = f"\n*Fix Version*: {self.data.fix_version}" if self.data.fix_version else ""

patch_urls_text = "\n".join([f"*Patch URL {i+1}*: {url}" for i, url in enumerate(self.data.patch_urls)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better readability in the generated Jira comment, it would be good to handle the case of a single patch URL differently. When there's only one URL, using *Patch URL*: ... is cleaner than *Patch URL 1*: .... When there are multiple URLs, the numbered list is great.

Suggested change
patch_urls_text = "\n".join([f"*Patch URL {i+1}*: {url}" for i, url in enumerate(self.data.patch_urls)])
patch_urls_text = f"*Patch URL*: {self.data.patch_urls[0]}" if len(self.data.patch_urls) == 1 else "\n".join(f"*Patch URL {i+1}*: {url}" for i, url in enumerate(self.data.patch_urls))

Copy link
Member

@nforro nforro Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike Gemini, I would rather suggest a formatted list:

*Patch URLs:*
  - https://example.com/patch1
  - https://example.com/patch2

(or whatever Jira supports).

@TomasTomecek TomasTomecek marked this pull request as ready for review November 3, 2025 16:16
@TomasTomecek
Copy link
Member Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the backport agent to handle multiple upstream patches for a single Jira issue, which is a great enhancement. The changes are mostly consistent, replacing upstream_fix with upstream_patches across the codebase, including Makefiles, documentation, agent prompts, and data models.

I've found a few critical issues that need attention:

  • The agent instructions for the fallback GIT AM WORKFLOW are contradictory and will not work correctly with multiple patches.
  • There's a type mismatch introduced in common/models.py that will cause a runtime error when caching merge request metadata.
  • An edge case when handling the UPSTREAM_PATCHES environment variable could lead to a crash if it's an empty string.

My detailed comments provide suggestions to fix these issues. Once addressed, this will be a solid contribution.

common/models.py Outdated
title: str = Field(description="Merge request title")
package: str = Field(description="Package name")
details: str = Field(description="Operation-specific identifier (upstream_fix URL for backport, version for rebase)")
details: list[str] = Field(description="Operation-specific identifier (list of upstream patch URLs for backport, version for rebase)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Changing details from str to list[str] in CachedMRMetadata creates a type conflict. The cache_mr_metadata function in agents/tasks.py is called with a list[str] from backport_agent.py, but its signature expects details: str. Furthermore, inside cache_mr_metadata, it calls details.encode(), which will fail on a list. Finally, it attempts to instantiate CachedMRMetadata with a str for details, which now expects a list[str]. The cache_mr_metadata function needs to be updated to handle a list of strings for backports. A possible solution is to join the list into a stable string representation before hashing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, Cursor did that

and (upstream_patches_raw := os.getenv("UPSTREAM_PATCHES", None))
and (jira_issue := os.getenv("JIRA_ISSUE", None))
):
upstream_patches = upstream_patches_raw.split(",")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If the UPSTREAM_PATCHES environment variable is set to an empty string, upstream_patches_raw.split(',') will result in ['']. This will cause the agent to attempt to download from an empty URL, which will likely fail. You should filter out empty strings after splitting.

Suggested change
upstream_patches = upstream_patches_raw.split(",")
upstream_patches = [p.strip() for p in upstream_patches_raw.split(",") if p.strip()]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can't be empty mate

@TomasTomecek
Copy link
Member Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully extends the backport agent to handle multiple patches for a single Jira issue. The changes across the Makefile, README, and Python models are consistent with this goal.

I've identified two issues in agents/backport_agent.py:

  1. A critical bug where a list of patch URLs is passed to a function expecting a string, which will lead to a runtime error.
  2. A high-severity issue where the logic for staging files for commit does not handle the multiple patch files generated by the git am workflow.

Additionally, there appears to be a contradiction in the agent's instructions (get_instructions). The cherry-pick workflow is instructed to generate a single combined patch file, but a later step instructs the agent to add a new Patch tag for every upstream patch, which implies multiple patch files. This could confuse the agent and should be clarified.

Overall, the changes are in the right direction, but the identified issues should be addressed to ensure the new functionality works correctly and robustly.

operation_type="backport",
package=state.package,
details=state.upstream_fix,
details=state.upstream_patches,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The cache_mr_metadata function expects the details parameter to be a string, but state.upstream_patches is a list of strings. This will cause a runtime AttributeError: 'list' object has no attribute 'encode' when details.encode() is called within cache_mr_metadata. You should convert the list of patch URLs into a single string before passing it. Joining with a comma would be consistent with how the UPSTREAM_PATCHES environment variable is parsed.

Suggested change
details=state.upstream_patches,
details=",".join(state.upstream_patches),

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to do str(patches) in the meantime

Comment on lines 222 to 223
Their name is {{jira_issue}}-N.patch where N is the index starting from 0.
For example, the first patch is at {{local_clone}}/{{jira_issue}}-0.patch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not related to this PR, but what exactly should {{local_clone}}/{{jira_issue}} represent here? These are instructions, there is no templating, no substitution. The model can make something from {{jira_issue}}, but it should be <JIRA_ISSUE>. {{local_clone}} means nothing though, in the prompt it is substituted with a path and referenced as "working directory", not "local clone".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very good points, will drop the templating then

Comment on lines +135 to +136
patch_urls: list[str] = Field(
description="A list of URLs to the sources of the fixes that were validated using the PatchValidator tool")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upstream_patches is a list but it's not treated as one in the prompt template. I can see that a direct substitution renders as e.g. ['patch1', 'patch2'], which is probably fine for the models to read, so take this just as a FYI. The templating docs are here: https://mustache.github.io/mustache.5.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I didn't know we are following some syntax 🤦

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you meant the <> syntax in system prompt, not the prompt templating, now I understand

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the list syntax.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you meant the <> syntax in system prompt

No, that's just a way how to express variables/placeholders, it's not defined anywhere.

make JIRA_ISSUE=RHEL-12345 run-triage-agent-standalone
make PACKAGE=httpd VERSION=2.4.62 JIRA_ISSUE=RHEL-12345 BRANCH=c10s run-rebase-agent-standalone
make PACKAGE=httpd UPSTREAM_FIX=https://github.com/... JIRA_ISSUE=RHEL-12345 BRANCH=c10s run-backport-agent-standalone
make PACKAGE=httpd UPSTREAM_PATCHES=https://github.com/... JIRA_ISSUE=RHEL-12345 BRANCH=c10s run-backport-agent-standalone
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example should point out that the patches are supposed to be comma-delimited.

Signed-off-by: Tomas Tomecek <[email protected]>
Assisted-by: Cursor(Claude)
Copy link
Member

@nforro nforro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few nitpicks, feel free to ignore.

for standalone, you now need to pass UPSTREAM_PATCHES and not
UPSTREAM_FIX

Resolves packit/jotnar#109

Signed-off-by: Tomas Tomecek <[email protected]>
Co-authored-by: Nikola Forró <[email protected]>
Assisted-by: Cursor(Claude)
@TomasTomecek
Copy link
Member Author

feel free to ignore.

I don't know how to do that 😅

Committed, squashed, will merge once all CI is green.

@TomasTomecek TomasTomecek merged commit 69a6004 into packit:main Nov 4, 2025
7 checks passed
@TomasTomecek TomasTomecek deleted the issue109-multipatch branch November 4, 2025 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants