PM-33385 - Craft a multi-agent code review#96
Conversation
|
Great job! No new security vulnerabilities introduced in this pull request |
Co-authored-by: Copilot <copilot@github.com>
985e9ed to
50067d7
Compare
| "metadata": { | ||
| "description": "Official Bitwarden Claude Plugin Marketplace", | ||
| "version": "1.0.1", | ||
| "pluginRoot": "./plugins" |
There was a problem hiding this comment.
I have been experimenting with cross agent configuration (namely Github Copilot) and I found that this configuration is incorrect. It should only be used it all plugins below did not have the ./plugins/ prefix in the source property. However, since all of them do, this should be removed.
| - **READ** `references/discovery-standards.md`. The Hygiene Sweep is referenced by name in the Step 3 Agent 1 prompt; Line Number Accuracy is propagated verbatim into every Step 2–5 subagent prompt. | ||
| - **READ** `references/evaluation-standards.md`. Severity Levels, Do Not Flag, and Confidence Scoring are propagated verbatim into every Step 2–5 subagent prompt. | ||
|
|
||
| 2. Launch a single architecture & pattern compliance agent using the `bitwarden-tech-lead` subagent type. Give it the diff, the list of changed file paths, and — in PR mode only — the PR title and description. |
There was a problem hiding this comment.
When working locally against fellow teammate's feature branches, I have found this particular subagent to be very useful. It has pointed out where a teammate has unintentionally violated a CLAUDE.md or a README.md rule. The agents that only scan the diffs aren't finding those.
|
|
||
| Execute these steps in order. Do not skip, reorder, or combine steps. | ||
|
|
||
| 1. Gather context (no subagents). All `references/...` paths below resolve relative to this skill's directory — do not search elsewhere. |
There was a problem hiding this comment.
I spent the most time working to refine and clearly tell the main agent (the orchestrator for lack of a better name) what context is must gather and pass to subagents because they work independently with their own context window.
Claude Glossary
|
|
||
| This is not an exhaustive checklist — surface anything diff-visible that a senior engineer would flag in a real review. | ||
|
|
||
| ## Line Number Accuracy |
There was a problem hiding this comment.
The first few iterations of the skill saw Claude Code deliver some funky line numbers; therefore, I added this brief direct instruction to help guide it and I have seen much better results.
There was a problem hiding this comment.
As noted in the pull request description, these standards are intentionally different because of the depth of work being done by the subagents. The intention is that we don't surface noise to engineers, but real signals of problems. I feel strongly that humans should be relied upon for suggestions and nit-picks. Claude should stay in this lane IMO.
| - Speculative issues that depend on specific inputs or runtime state without evidence those inputs occur in practice. | ||
| - Pre-existing issues not introduced or worsened by this change. | ||
|
|
||
| ## Confidence Scoring |
There was a problem hiding this comment.
I have found good success with providing Claude with a scale that has concrete descriptions of what is and is not considered a confident finding.
There was a problem hiding this comment.
Arguably the best suggestion from the /skill-creator was this markdown. Given the subagents a concrete messaging format has drastically improved consistency of findings and being able to track what agent came up with what finding is also very helpful.
There was a problem hiding this comment.
Given that the skill is designed to be very flexible; keeping it in the main SKILL.md became overwhelming.
|
|
||
| 6. Merge all Step 4 and Step 5 returns by `id` into the master finding map. Creation-time fields are immutable (see `references/finding-shape.md`). For dismissed findings, set `dismissal_stage` to `"Step 4 validation"` or `"Step 5 severity audit"` based on which step set the dismissal status — it renders as `**Dismissed at:**`. Partition by final status: validated (Step 5 `confirmed` or `downgraded`) becomes the main Findings section; dismissed (Step 4 `dismissed` or Step 5 `dismissed`) preserves original severity, original confidence, dismissal stage, and dismissal reason for rendering in the Dismissed block. | ||
|
|
||
| 7. Format the report using the template in `references/report-template.md`. Cite every validated AND dismissed finding with full file path and line: `file/path.ext:{line}` (or `:{start}-{end}` for ranges). Omit any severity section with zero findings. If zero findings total, replace the Findings section with: "No findings found." For every rendered finding (validated and dismissed), populate the `**Caught by:**` line from the finding's `source_agent` field, translated to the friendly label per the table in `references/report-template.md`. Dismissed findings additionally render `**Original severity:**`, `**Original confidence:**`, `**Dismissed at:**`, and `**Dismissed because:**` per the template — past runs have silently dropped these, so do not omit any of them; per-finding traceability requires the full set. |
There was a problem hiding this comment.
Another clear distinction from the single agent reviews and commands. I don't want inline comments and I don't want a second file. I want to see a single, clear report with collapsed sections as needed so that engineers can work the report top-to-bottom.
Adding the Caught by and the Original.. fields were instrumental in working with the /skill-creator skill to diagnose inconsistencies and tune the subagent prompts.
|
|
||
| <!-- Only if there are rejected findings. Omit entirely if all confirmed. --> | ||
|
|
||
| ## Reviewed and Dismissed |
There was a problem hiding this comment.
One of my favorite parts of the skill is this addition.
When creating the Seeder Utility in the server repo and refining the skill itself, it was very helpful to keep these around in a simple expander/collapse to better understand what was going on in the process. Must keep IMO.
🤖 Bitwarden Claude Code ReviewOverall Assessment: APPROVE Reviewed the addition of the new Code Review DetailsNo new findings. Previously raised review comments (changelog cruft, README typo, |
|
Note: I am intentionally putting the actions/workflows/review-code.yml through it's paces. I am testing my recent change that triggers the workflow via applying the |
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

🎟️ Tracking
PM-33385 - Create multi-agent code review skill
📔 Objective
Adds a new
performing-multi-agent-code-reviewskill. Intentionally a different lens from the existingbitwarden-code-revieweragent — that agent reviews code the way a human reviewer would; this skill has Claude evaluate code as Claude, with output constraints a human can read. The pipeline runs 6 subagents per review each with a specific purpose and well-defined prompt.Nitty-gritty skill details
Expand
--modelto dial cost vs. depth.quality,bug,security) → per-finding validation → severity audit.Caught bysub-agent attribution (Architecture,Code quality,Bug analysis,Security & logic,Validation).Use cases
Expand
- At the end of a [Claude Code Agent Teams](https://code.claude.com/docs/en/agent-teams) coding session — slots into an agent feedback loop where teammates work, address findings, and refactor. - When an engineer is approaching the end of local development work and needs depth of review. - When an engineer has completed work on a draft PR and needs depth of review prior to publish. - When an engineer is peer-reviewing a high-density, cross-team, or very complex pull request.Field test results on live PRs
Cost estimates
Expand
Cost comparison sdk-internal/pull/1020
multi-agent review
single-agent
/bitwarden-code-review:code-review-localCost comparison misc/pull/348
multi-agent review
single-agent review
🧪 Testing
Expand for reviewer steps to try the four modes locally
Step 1: Install the required sibling plugins
The skill aborts without
bitwarden-tech-leadandbitwarden-security-engineeravailable in the same marketplace. Confirm both are installed before invoking.Step 2: Trigger each mode and confirm the report shape
In a Bitwarden repo checkout, invoke each form below and confirm a
code-review-*.mdlands at the repo root with severity-bucketed findings and aCaught by:line on every finding:/performing-multi-agent-code-review 12345(substitute a real PR number)/perform-multi-agent-code-review/perform-multi-agent-code-review/performing-multi-agent-code-review on the last 5 commitsStep 3: Confirm the dismissed-findings block
Each report should include a collapsed "dismissed findings" section listing findings that didn't survive Step 4 validation or Step 5 severity audit, with the rejection reason.