- 
                Notifications
    You must be signed in to change notification settings 
- Fork 576
Add initial AI api-review configuration #2489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Hello @JoelSpeed! Some important instructions when contributing to openshift/api: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some general questions for learning purposes.
Nothing worth blocking this on IMO, especially if you are finding it useful.
        
          
                .claude/commands/api-review.md
              
                Outdated
          
        
      | **Explanation:** [Why this change is needed] | ||
|  | ||
|  | ||
| I'll run a comprehensive API review for the OpenShift API changes in the specified GitHub PR. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like something an LLM would spit out when you ask for a review - is it necessary to include this in the instructions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this whole section looks like it is something an LLM spit out as an execution plan. Should something like this be hand-rolled with explicit instructions on how to conduct the review and important considerations?
I guess my curiosity here is if we made an explicit guidelines type document that humans could follow, an LLM should be able to follow along relatively easily and we can potentially enforce more nuanced guardrails.
Not worth blocking this on - more so asking questions for myself as I've not worked with LLMs in this capacity before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an artefact of how these documents are built. Using Claude locally, I've given it text based prompts, and it has converted that into instructions that it can read. So 95% of this document is it translating my instructions and feedback into rules that it can later apply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. That is interesting. So this file is automatically updated with more detailed instructions by Claude as the AGENTS.md file is updated?
If you update this file by hand to "improve it", what happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can improve it by hand, that's also fine and I have made a few edits here and there. But the bulk was generated, (then pruned - it was more verbose), and tested again to see if it was producing good results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it necessary to include this in the instructions?
I've found giving explicit examples, when you want a particular format of output to work quite well. Unfortunately, it seems best with monkey see monkey do style learning in this department :(
I guess my curiosity here is if we made an explicit guidelines type document that humans could follow, an LLM should be able to follow along relatively easily and we can potentially enforce more nuanced guardrails.
I think this should work, and is sort of what the section ### API review in AGENTS.md is, but you would still need something similar to this document for the command to give it a guide on how you want the output, and communication style alongside extra prompts for when it fails to follow the rules...
If you update this file by hand to "improve it", what happens?
I'be found that often, what makes sense to me (as a human, reading the document), the AI will ignore / may degrade performance. Conversing with the model and asking it what seems to be the issue (why didn't you listen to my instructions?) does seem to help - but yeilds different structures to what we'd normally expect e.g for documentation to be consumed by humans.
This does lead into one of the bigger problems with automating problems like this, which is how do you ensure consistent output, and avoid regressions when experimenting with prompts?  You've got a probabilistic system you need to build confidence in, and don't want to rely on manual checks or 'vibes'.  evals are the way the industry seems to be moving, but the OSS tooling all seems pretty heavyweight for automating internal tasks / reviews.
One approach for API review that may work could be directing the command (or a version of it) to output json:
{
  "summary": "…",
  "issues": [
    {"rule": "fieldDocumentation",  "msg": "...", "lines": "…"},
    {"rule": "optionalFieldBehaviour", "msg": "...", "lines": "…"}
  ],
  "meta": {"model": "claude-xxx",  "rules_version": "<commit?>"}
}Having sets of real API PRs, where we can write units that Expect the correct issues to be caught. Something like:
golden/: real API doc chunks + your expected issue set (ground truth).
synthetic/: synthetic snippets each targeting exactly one rule. These are your 'unit tests' .
Then you can compare existing rules/prompt to changed, and hopefully catch any changes? There are lots of ways to extend this too e.g mutating synthetic snippets, and still expecting issues to be caught, or using existing reviews (e.g merged PRs with comment chains + changes requested) and an llm as a judge style system.
Either way, its not simple... unfortunately :(
|  | ||
|  | ||
| I'll run a comprehensive API review for the OpenShift API changes in the specified GitHub PR. | ||
|  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to create a separate H1 heading to explain to the LLM that this is the steps for how to actually conduct the review?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weirdly, I'll run a comprehensive API review for the OpenShift API changes in the specified GitHub PR. seems to work just fine. I've had instances (e.g on ccapio) where adding headings lead to the agent ignoring them 🤦
It's the sort of thing you want to build something to allow you to test :/
| ### Testing | ||
| ```bash | ||
| make test-unit # Run unit tests | ||
| make integration # Run integration tests (in tests/ directory) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we teach it how to run the integration tests with more focused arguments?
That way running the integration tests don't take longer than necessary when reviewing a change that only impacts a subset of the APIs/tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's a good idea, I'll have a go at that
| /lgtm As this will be an iterative process of working out what works, this seems like a good place to start. /hold for focussing integration tests | 
| /test remaining-required Overriding unmatched contexts: | 
| @openshift-ci-robot: The specified target(s) for  The following commands are available to trigger optional jobs: Use  In response to this: 
 Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. | 
| @openshift-ci-robot: Overrode contexts on behalf of openshift-ci-robot: ci/prow/e2e-aws-ovn, ci/prow/e2e-aws-ovn-hypershift, ci/prow/e2e-aws-ovn-hypershift-conformance, ci/prow/e2e-aws-ovn-techpreview, ci/prow/e2e-aws-serial-1of2, ci/prow/e2e-aws-serial-2of2, ci/prow/e2e-aws-serial-techpreview-1of2, ci/prow/e2e-aws-serial-techpreview-2of2, ci/prow/e2e-azure, ci/prow/e2e-gcp, ci/prow/e2e-upgrade, ci/prow/e2e-upgrade-out-of-change In response to this: 
 Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. | 
| Couple of TODOs which I'll leave as a note here, so keep the hold for now: 
 | 
| --- | ||
| name: api-review | ||
| description: Run strict OpenShift API review workflow for PR changes | ||
| parameters: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this parameters block coming from? I can only see $N args convention in claude docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Claude generated most of this document itself based on guidance we were giving it through the CLI, I'm pretty certain it wrote this out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See https://docs.claude.com/en/docs/claude-code/slash-commands#parameters
We're just naming, and explicitly requiring them here, I think..the docs aren't too clear
| @everettraven I've made some updates to this (separate commits) to ensure it is able to review locally checked out code too, when you have a moment, I think this is ready to be merged so we can start folks on adopting it in the wider sphere | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
| /verified bypass | 
| [APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: everettraven, theobarberbany The full list of commands accepted by this bot can be found here. The pull request process is described here 
Needs approval from an approver in each of these files:
 
 Approvers can indicate their approval by writing  | 
| @everettraven: The  In response to this: 
 Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. | 
| @JoelSpeed I'll defer to you on removing the hold that seems to be present | 
| /hold cancel | 
| @JoelSpeed: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. | 
This adds an initial AGENTS.md configuration for how to API review via an AI agent such as claude.
It also implements a
/api-reviewcommand that can be used locally to review PRs for anyone who has claude installed.I hope we can get folks using this to self help, but my long term goal is to integrate this into coderabbit or some other review tool that can post the comments directly on the PR.
As an example of the output, see #2488 (comment)
Currently highlights of its instructions: