Skip to content

Conversation

@copejon
Copy link
Contributor

@copejon copejon commented Nov 24, 2025

Init agent that is capable of analyzing CI failures in prow. The agent's workflow focuses on a methodical approach to failure analysis, following these steps:

  1. Create a list errors and failures found in the build.log
  2. Characterize each error and failure based on context from the build log and use this to determine if the error is an infra issue, microshift runtime error, or a legitimate test failure.
  3. Investigate further depending on the nature of the error:
    • For legitimate test errors, analyze the test logs.
    • For runtime errors, download and analyze the sos report
  4. Produce a report based on the findings of step 3.

To invoke the agent, pass the prow job's url to claude, e.g.

$ claude https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_microshift/5596/pull-ci-openshift-microshift-main-e2e-aws-tests-arm/1995881118070476800

There's plenty of room for improvement here. For future contributions, consider:

  • Delegation: use sub-agents to perform specialized, lower-level analysis (sos-report agent, microshift source code agent, etc). Especially useful for scoping agent's context to the task
  • Additional workflow steps, e.g. after identifying a legitmate test failure, analyze microshift code base (or diff, for PRs) to determine where the error was introduced.
  • Honing Suggested Remidations: in this PR, the agent is not given much direction on the HOW of error fixing and bases these recommendations off the context it's given.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 24, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 24, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 24, 2025
@copejon
Copy link
Contributor Author

copejon commented Nov 24, 2025

/test test-unit
/test verify

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 24, 2025

@copejon: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@kasturinarra
Copy link
Contributor

@copejon hey, should you change this command ? $ claude https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_microshift/9999/pull-ci-openshift-microshift-release-4.20-metal-periodic-test/1234567894561234156

I tried to run it using @openshift-ci-analysis <job_url_name>`

@kasturinarra
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 26, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 26, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: copejon, kasturinarra

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [copejon,kasturinarra]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@@ -0,0 +1,18 @@
{
"permissions": {
"allow": [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these allow permissions overrride the allow-tools from other Claude commands? for example

I'd follow the approach to set permissions individually on each Claude command instead of adding global allow permissions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds an agent, so the perms have to be specified in the settings.json. That said, the settings.json doesn't override commands.

@copejon copejon force-pushed the no-issue-claude-prow-failure-analyzing-agent branch from 8c01b3f to be7e359 Compare December 2, 2025 17:44
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Dec 2, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 2, 2025

New changes are detected. LGTM label has been removed.

@copejon
Copy link
Contributor Author

copejon commented Dec 2, 2025

@copejon hey, should you change this command ? $ claude https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_microshift/9999/pull-ci-openshift-microshift-release-4.20-metal-periodic-test/1234567894561234156

I tried to run it using @openshift-ci-analysis <job_url_name>`

@kasturinarra That's my fault. The url in the description isn't for a real job. Will fix!

Also, this is structured as an agent. Just passing the url to claude (as long as claude is run in the project root) is enough to trigger the agent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants