-
Notifications
You must be signed in to change notification settings - Fork 21
Our agents can handle multiple patches for a single Jira #115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly modifies the data models and agent prompts to support multiple patch URLs from the triage agent. However, there are a few issues to address. A critical bug exists in patch_validator.py where the input model is not updated to accept multiple URLs, which will lead to a runtime error. Additionally, the patch validation can be made more efficient by processing URLs concurrently. A more significant concern is that this change breaks the backport_agent, which has not been updated to handle the new list of patch URLs. This breaking change should be addressed, either by updating the consumer in this PR or by making the change backward-compatible.
| content = partial_bytes.decode(response.get_encoding() or 'utf-8', errors='ignore') | ||
| content += f"\n\n[Content truncated - showing first {max_length} bytes of {response.content_length} total]" | ||
| results = [] | ||
| for url in tool_input.urls: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| results = [] | ||
| for url in tool_input.urls: | ||
| url = url.strip() | ||
|
|
||
| is_issue, reason = self._is_issue_reference(url) | ||
|
|
||
| status_code = None | ||
| is_accessible = False | ||
| content = None | ||
|
|
||
| if not is_issue: | ||
| try: | ||
| timeout = aiohttp.ClientTimeout(total=30) | ||
| async with aiohttp.ClientSession(timeout=timeout) as session: | ||
| async with session.get(url) as response: | ||
| status_code = response.status | ||
| is_accessible = response.status < 400 | ||
|
|
||
| if is_accessible: | ||
| max_length = MAX_CONTENT_LENGTH | ||
| if response.content_length and response.content_length > max_length: | ||
| # Avoid reading a huge response into memory. | ||
| partial_bytes = await response.content.read(max_length) | ||
| content = partial_bytes.decode(response.get_encoding() or 'utf-8', errors='ignore') | ||
| content += f"\n\n[Content truncated - showing first {max_length} bytes of {response.content_length} total]" | ||
| else: | ||
| # For responses without content-length or smaller ones, read fully and then truncate. | ||
| raw_content = await response.text() | ||
| content = self._truncate_content(raw_content) | ||
| reason = "URL is accessible and content fetched successfully" | ||
| else: | ||
| # For responses without content-length or smaller ones, read fully and then truncate. | ||
| raw_content = await response.text() | ||
| content = self._truncate_content(raw_content) | ||
| reason = "URL is accessible and content fetched successfully" | ||
| else: | ||
| reason = f"Not an issue reference but not accessible (HTTP {response.status})" | ||
|
|
||
| except asyncio.TimeoutError: | ||
| reason = "Not an issue reference but request timeout" | ||
| except Exception as e: | ||
| reason = f"Not an issue reference but raised exception: {str(e)}" | ||
|
|
||
| result = PatchValidatorResult( | ||
| is_accessible=is_accessible, | ||
| status_code=status_code, | ||
| content=content, | ||
| reason=reason, | ||
| ) | ||
|
|
||
| return PatchValidatorOutput(result) | ||
| reason = f"Not an issue reference but not accessible (HTTP {response.status})" | ||
|
|
||
| except asyncio.TimeoutError: | ||
| reason = "Not an issue reference but request timeout" | ||
| except Exception as e: | ||
| reason = f"Not an issue reference but raised exception: {str(e)}" | ||
|
|
||
| results.append(PatchValidatorResult( | ||
| url=url, | ||
| is_accessible=is_accessible, | ||
| status_code=status_code, | ||
| content=content, | ||
| reason=reason, | ||
| )) | ||
|
|
||
| return PatchValidatorOutput(results) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation processes URLs sequentially in a for loop. Since this is an async function, you can significantly improve performance by processing the URLs concurrently. Consider using asyncio.gather to run the network requests for all URLs in parallel. This will make the tool much faster when validating multiple patches.
5861389 to
6a553eb
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to allow agents to handle multiple patches for a single Jira issue, which is a valuable feature. The changes involve updating BackportData to accept a list of patch_urls instead of a single URL and adjusting the PatchValidatorResult to include the URL it validated. While the changes are in the right direction, introducing a list of URLs is a breaking change. I've found that backport_agent.py has not been updated to handle this list, which will cause runtime failures. I've left a critical comment regarding this. I've also provided a couple of suggestions to improve the robustness and the formatting of the generated comments.
| patch_urls: list[str] = Field( | ||
| description="A list of URLs to the sources of the fixes that were validated using the PatchValidator tool") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change from a single patch_url to a list patch_urls is a significant breaking change. The rest of the codebase does not seem to be updated to handle this, which will likely cause runtime errors.
For example, agents/backport_agent.py still references backport_data.patch_url on line 536, which no longer exists. The run_workflow function in that file is designed to handle only a single upstream_fix string. This PR should probably include the necessary changes in the agents to handle multiple patches, or this change should be behind a feature flag to avoid breaking the main branch.
Additionally, to improve robustness, you should ensure that patch_urls is never an empty list. You can enforce this with Pydantic's validation by adding min_length=1 to the Field definition.
| patch_urls: list[str] = Field( | |
| description="A list of URLs to the sources of the fixes that were validated using the PatchValidator tool") | |
| patch_urls: list[str] = Field( | |
| min_length=1, | |
| description="A list of URLs to the sources of the fixes that were validated using the PatchValidator tool") |
| case BackportData(): | ||
| fix_version_text = f"\n*Fix Version*: {self.data.fix_version}" if self.data.fix_version else "" | ||
|
|
||
| patch_urls_text = "\n".join([f"*Patch URL {i+1}*: {url}" for i, url in enumerate(self.data.patch_urls)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better readability in the generated Jira comment, it would be good to handle the case of a single patch URL differently. When there's only one URL, using *Patch URL*: ... is cleaner than *Patch URL 1*: .... When there are multiple URLs, the numbered list is great.
| patch_urls_text = "\n".join([f"*Patch URL {i+1}*: {url}" for i, url in enumerate(self.data.patch_urls)]) | |
| patch_urls_text = f"*Patch URL*: {self.data.patch_urls[0]}" if len(self.data.patch_urls) == 1 else "\n".join(f"*Patch URL {i+1}*: {url}" for i, url in enumerate(self.data.patch_urls)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike Gemini, I would rather suggest a formatted list:
*Patch URLs:*
- https://example.com/patch1
- https://example.com/patch2
(or whatever Jira supports).
6a553eb to
792b311
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the backport agent to handle multiple upstream patches for a single Jira issue, which is a great enhancement. The changes are mostly consistent, replacing upstream_fix with upstream_patches across the codebase, including Makefiles, documentation, agent prompts, and data models.
I've found a few critical issues that need attention:
- The agent instructions for the fallback
GIT AM WORKFLOWare contradictory and will not work correctly with multiple patches. - There's a type mismatch introduced in
common/models.pythat will cause a runtime error when caching merge request metadata. - An edge case when handling the
UPSTREAM_PATCHESenvironment variable could lead to a crash if it's an empty string.
My detailed comments provide suggestions to fix these issues. Once addressed, this will be a solid contribution.
common/models.py
Outdated
| title: str = Field(description="Merge request title") | ||
| package: str = Field(description="Package name") | ||
| details: str = Field(description="Operation-specific identifier (upstream_fix URL for backport, version for rebase)") | ||
| details: list[str] = Field(description="Operation-specific identifier (list of upstream patch URLs for backport, version for rebase)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing details from str to list[str] in CachedMRMetadata creates a type conflict. The cache_mr_metadata function in agents/tasks.py is called with a list[str] from backport_agent.py, but its signature expects details: str. Furthermore, inside cache_mr_metadata, it calls details.encode(), which will fail on a list. Finally, it attempts to instantiate CachedMRMetadata with a str for details, which now expects a list[str]. The cache_mr_metadata function needs to be updated to handle a list of strings for backports. A possible solution is to join the list into a stable string representation before hashing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, Cursor did that
| and (upstream_patches_raw := os.getenv("UPSTREAM_PATCHES", None)) | ||
| and (jira_issue := os.getenv("JIRA_ISSUE", None)) | ||
| ): | ||
| upstream_patches = upstream_patches_raw.split(",") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the UPSTREAM_PATCHES environment variable is set to an empty string, upstream_patches_raw.split(',') will result in ['']. This will cause the agent to attempt to download from an empty URL, which will likely fail. You should filter out empty strings after splitting.
| upstream_patches = upstream_patches_raw.split(",") | |
| upstream_patches = [p.strip() for p in upstream_patches_raw.split(",") if p.strip()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it can't be empty mate
792b311 to
89dcd22
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request successfully extends the backport agent to handle multiple patches for a single Jira issue. The changes across the Makefile, README, and Python models are consistent with this goal.
I've identified two issues in agents/backport_agent.py:
- A critical bug where a list of patch URLs is passed to a function expecting a string, which will lead to a runtime error.
- A high-severity issue where the logic for staging files for commit does not handle the multiple patch files generated by the
git amworkflow.
Additionally, there appears to be a contradiction in the agent's instructions (get_instructions). The cherry-pick workflow is instructed to generate a single combined patch file, but a later step instructs the agent to add a new Patch tag for every upstream patch, which implies multiple patch files. This could confuse the agent and should be clarified.
Overall, the changes are in the right direction, but the identified issues should be addressed to ensure the new functionality works correctly and robustly.
agents/backport_agent.py
Outdated
| operation_type="backport", | ||
| package=state.package, | ||
| details=state.upstream_fix, | ||
| details=state.upstream_patches, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache_mr_metadata function expects the details parameter to be a string, but state.upstream_patches is a list of strings. This will cause a runtime AttributeError: 'list' object has no attribute 'encode' when details.encode() is called within cache_mr_metadata. You should convert the list of patch URLs into a single string before passing it. Joining with a comma would be consistent with how the UPSTREAM_PATCHES environment variable is parsed.
| details=state.upstream_patches, | |
| details=",".join(state.upstream_patches), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to do str(patches) in the meantime
89dcd22 to
a0faaf8
Compare
agents/backport_agent.py
Outdated
| Their name is {{jira_issue}}-N.patch where N is the index starting from 0. | ||
| For example, the first patch is at {{local_clone}}/{{jira_issue}}-0.patch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not related to this PR, but what exactly should {{local_clone}}/{{jira_issue}} represent here? These are instructions, there is no templating, no substitution. The model can make something from {{jira_issue}}, but it should be <JIRA_ISSUE>. {{local_clone}} means nothing though, in the prompt it is substituted with a path and referenced as "working directory", not "local clone".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very good points, will drop the templating then
| patch_urls: list[str] = Field( | ||
| description="A list of URLs to the sources of the fixes that were validated using the PatchValidator tool") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upstream_patches is a list but it's not treated as one in the prompt template. I can see that a direct substitution renders as e.g. ['patch1', 'patch2'], which is probably fine for the models to read, so take this just as a FYI. The templating docs are here: https://mustache.github.io/mustache.5.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I didn't know we are following some syntax 🤦
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought you meant the <> syntax in system prompt, not the prompt templating, now I understand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the list syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought you meant the
<>syntax in system prompt
No, that's just a way how to express variables/placeholders, it's not defined anywhere.
| make JIRA_ISSUE=RHEL-12345 run-triage-agent-standalone | ||
| make PACKAGE=httpd VERSION=2.4.62 JIRA_ISSUE=RHEL-12345 BRANCH=c10s run-rebase-agent-standalone | ||
| make PACKAGE=httpd UPSTREAM_FIX=https://github.com/... JIRA_ISSUE=RHEL-12345 BRANCH=c10s run-backport-agent-standalone | ||
| make PACKAGE=httpd UPSTREAM_PATCHES=https://github.com/... JIRA_ISSUE=RHEL-12345 BRANCH=c10s run-backport-agent-standalone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example should point out that the patches are supposed to be comma-delimited.
Signed-off-by: Tomas Tomecek <[email protected]> Assisted-by: Cursor(Claude)
efbb67e to
568f4ea
Compare
nforro
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few nitpicks, feel free to ignore.
for standalone, you now need to pass UPSTREAM_PATCHES and not UPSTREAM_FIX Resolves packit/jotnar#109 Signed-off-by: Tomas Tomecek <[email protected]> Co-authored-by: Nikola Forró <[email protected]> Assisted-by: Cursor(Claude)
c18974b to
45b4a2b
Compare
I don't know how to do that 😅 Committed, squashed, will merge once all CI is green. |
Got a successful run (comment generated with gemini from the inline console string):
Status Details
Successfully backported three upstream commits to the
git-lfspackage for RHEL-122431.Cherry-pick Workflow Used
https://github.com/git-lfs/git-lfs.gitea47a34bde1ba1ecf9eb8dc6d44fcf7aef42f5e3)0001-Use-constant-format-string.patchCommits Backported
0cffe93176b870055c9dadbb3cc9a4a440e98396d02bd13f02ef76f6807581cd6b34709069cb36155c11ffce9a4f095ff356bc781e2a031abb46c1a8Conflict Resolution Details
0cffe93176b870055c9dadbb3cc9a4a440e98396commands/command_checkout.go: Added path/filepath import and integrated directory/symlink conflict checking logic.*
commands/pull.go: Added directory walker checks before decoding pointers, integrated path creation logic.*
lfs/gitfilter_smudge.go: Changed fromos.Stattoos.LstatwithIsRegular()check to handle symlinks properly.*
t/t-checkout.sh: Updated test path and added two new test cases.*
t/t-pull.sh: Updated test expectations and added two new test cases.5c11ffce9a4f095ff356bc781e2a031abb46c1a8lfs/gitfilter_smudge.go: Re-addedtools.MkdirAllcall needed for directory creation.*
t/t-checkout.sh: Removed duplicate conflict markers.*
t/t-pull.sh: Removed duplicate conflict markers.Patch Integration Summary
RHEL-122431.patchcentpkg prep(exit 0).git-lfs-3.6.1-3.el9.src.rpm(3.5MB).Fixes https://github.com/packit/jotnar/issues/109