Skip to content

PM-33385 - Craft a multi-agent code review#96

Open
theMickster wants to merge 14 commits intomainfrom
multi-agent-review
Open

PM-33385 - Craft a multi-agent code review#96
theMickster wants to merge 14 commits intomainfrom
multi-agent-review

Conversation

@theMickster
Copy link
Copy Markdown
Contributor

@theMickster theMickster commented Apr 24, 2026

🎟️ Tracking

PM-33385 - Create multi-agent code review skill

📔 Objective

Adds a new performing-multi-agent-code-review skill. Intentionally a different lens from the existing bitwarden-code-reviewer agent — that agent reviews code the way a human reviewer would; this skill has Claude evaluate code as Claude, with output constraints a human can read. The pipeline runs 6 subagents per review each with a specific purpose and well-defined prompt.

Nitty-gritty skill details

Expand
  • Four invocation modes: PR, local changes, branch comparison, and commit-range (e.g. "last week of commits", "last 20 commits", "since 2026-04-23").
  • Configurable model — opus by default; tunable via --model to dial cost vs. depth.
  • Pipeline: architecture compliance pass → 3 parallel diff reviewers (quality, bug, security) → per-finding validation → severity audit.
  • Confidence threshold ≥ 80; everything below is dropped silently.
  • The architecture agent is the only codebase-aware lens — the 3 parallel reviewers are deliberately constrained to the diff to avoid context contamination. The architecture agent counterweights by loading full files, sibling code, and project docs, catching pattern violations, boundary violations, and doc/code drift the diff-only reviewers cannot.
  • Severity-bucketed local markdown report written to the repo root.
  • Findings render in 🛑 Blocker / ⚠️ Important / ♻️ Refactor with a collapsed "dismissed findings" block showing what was caught and rejected, with rejection reasons.
  • Each finding includes a Caught by sub-agent attribution (Architecture, Code quality, Bug analysis, Security & logic, Validation).

Use cases

Expand - At the end of a [Claude Code Agent Teams](https://code.claude.com/docs/en/agent-teams) coding session — slots into an agent feedback loop where teammates work, address findings, and refactor. - When an engineer is approaching the end of local development work and needs depth of review. - When an engineer has completed work on a draft PR and needs depth of review prior to publish. - When an engineer is peer-reviewing a high-density, cross-team, or very complex pull request.

Field test results on live PRs

  1. PM-26250 Explore options to enable direct importer for mac app store build clients#17479 (comment)
  2. [PM-32016] - make at-risk banner dismissable clients#20505 (comment)
  3. [PM-34165] Introduce InviteKeyBundle for supporting future AC auto-confirm workflow sdk-internal#1021 (comment)
  4. Add missing functions to Send API calls sdk-internal#961 (comment)
  5. [PM-32211] fix private key before key rotation sdk-internal#994 (comment)

Cost estimates

Expand

Cost comparison sdk-internal/pull/1020

multi-agent review

image

single-agent /bitwarden-code-review:code-review-local

image

Cost comparison misc/pull/348

multi-agent review

image

single-agent review

image

🧪 Testing

Expand for reviewer steps to try the four modes locally

Step 1: Install the required sibling plugins

The skill aborts without bitwarden-tech-lead and bitwarden-security-engineer available in the same marketplace. Confirm both are installed before invoking.

Step 2: Trigger each mode and confirm the report shape

In a Bitwarden repo checkout, invoke each form below and confirm a code-review-*.md lands at the repo root with severity-bucketed findings and a Caught by: line on every finding:

  • PR mode — /performing-multi-agent-code-review 12345 (substitute a real PR number)
  • Local mode — make an uncommitted change, then /perform-multi-agent-code-review
  • Branch comparison — on a clean feature branch, /perform-multi-agent-code-review
  • Commit-range — /performing-multi-agent-code-review on the last 5 commits

Step 3: Confirm the dismissed-findings block

Each report should include a collapsed "dismissed findings" section listing findings that didn't survive Step 4 validation or Step 5 severity audit, with the rejection reason.

@github-actions
Copy link
Copy Markdown

Logo
Checkmarx One – Scan Summary & Details2438330f-9677-47c0-b21d-888091cf8e04

Great job! No new security vulnerabilities introduced in this pull request

Co-authored-by: Copilot <copilot@github.com>
"metadata": {
"description": "Official Bitwarden Claude Plugin Marketplace",
"version": "1.0.1",
"pluginRoot": "./plugins"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been experimenting with cross agent configuration (namely Github Copilot) and I found that this configuration is incorrect. It should only be used it all plugins below did not have the ./plugins/ prefix in the source property. However, since all of them do, this should be removed.

- **READ** `references/discovery-standards.md`. The Hygiene Sweep is referenced by name in the Step 3 Agent 1 prompt; Line Number Accuracy is propagated verbatim into every Step 2–5 subagent prompt.
- **READ** `references/evaluation-standards.md`. Severity Levels, Do Not Flag, and Confidence Scoring are propagated verbatim into every Step 2–5 subagent prompt.

2. Launch a single architecture & pattern compliance agent using the `bitwarden-tech-lead` subagent type. Give it the diff, the list of changed file paths, and — in PR mode only — the PR title and description.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When working locally against fellow teammate's feature branches, I have found this particular subagent to be very useful. It has pointed out where a teammate has unintentionally violated a CLAUDE.md or a README.md rule. The agents that only scan the diffs aren't finding those.


Execute these steps in order. Do not skip, reorder, or combine steps.

1. Gather context (no subagents). All `references/...` paths below resolve relative to this skill's directory — do not search elsewhere.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent the most time working to refine and clearly tell the main agent (the orchestrator for lack of a better name) what context is must gather and pass to subagents because they work independently with their own context window.
Claude Glossary


This is not an exhaustive checklist — surface anything diff-visible that a senior engineer would flag in a real review.

## Line Number Accuracy
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first few iterations of the skill saw Claude Code deliver some funky line numbers; therefore, I added this brief direct instruction to help guide it and I have seen much better results.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted in the pull request description, these standards are intentionally different because of the depth of work being done by the subagents. The intention is that we don't surface noise to engineers, but real signals of problems. I feel strongly that humans should be relied upon for suggestions and nit-picks. Claude should stay in this lane IMO.

- Speculative issues that depend on specific inputs or runtime state without evidence those inputs occur in practice.
- Pre-existing issues not introduced or worsened by this change.

## Confidence Scoring
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have found good success with providing Claude with a scale that has concrete descriptions of what is and is not considered a confident finding.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably the best suggestion from the /skill-creator was this markdown. Given the subagents a concrete messaging format has drastically improved consistency of findings and being able to track what agent came up with what finding is also very helpful.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the skill is designed to be very flexible; keeping it in the main SKILL.md became overwhelming.


6. Merge all Step 4 and Step 5 returns by `id` into the master finding map. Creation-time fields are immutable (see `references/finding-shape.md`). For dismissed findings, set `dismissal_stage` to `"Step 4 validation"` or `"Step 5 severity audit"` based on which step set the dismissal status — it renders as `**Dismissed at:**`. Partition by final status: validated (Step 5 `confirmed` or `downgraded`) becomes the main Findings section; dismissed (Step 4 `dismissed` or Step 5 `dismissed`) preserves original severity, original confidence, dismissal stage, and dismissal reason for rendering in the Dismissed block.

7. Format the report using the template in `references/report-template.md`. Cite every validated AND dismissed finding with full file path and line: `file/path.ext:{line}` (or `:{start}-{end}` for ranges). Omit any severity section with zero findings. If zero findings total, replace the Findings section with: "No findings found." For every rendered finding (validated and dismissed), populate the `**Caught by:**` line from the finding's `source_agent` field, translated to the friendly label per the table in `references/report-template.md`. Dismissed findings additionally render `**Original severity:**`, `**Original confidence:**`, `**Dismissed at:**`, and `**Dismissed because:**` per the template — past runs have silently dropped these, so do not omit any of them; per-finding traceability requires the full set.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another clear distinction from the single agent reviews and commands. I don't want inline comments and I don't want a second file. I want to see a single, clear report with collapsed sections as needed so that engineers can work the report top-to-bottom.

Adding the Caught by and the Original.. fields were instrumental in working with the /skill-creator skill to diagnose inconsistencies and tune the subagent prompts.


<!-- Only if there are rejected findings. Omit entirely if all confirmed. -->

## Reviewed and Dismissed
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of my favorite parts of the skill is this addition.
When creating the Seeder Utility in the server repo and refining the skill itself, it was very helpful to keep these around in a simple expander/collapse to better understand what was going on in the process. Must keep IMO.

@theMickster theMickster marked this pull request as ready for review May 4, 2026 08:35
@theMickster theMickster requested a review from a team as a code owner May 4, 2026 08:35
@theMickster theMickster added the ai-review Request a Claude code review label May 4, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

🤖 Bitwarden Claude Code Review

Overall Assessment: APPROVE

Reviewed the addition of the new performing-multi-agent-code-review skill to the bitwarden-code-review plugin. The change ships a new SKILL.md plus five references/*.md files describing modes, finding shape, evaluation standards, discovery standards, and the report template. Version bump (1.10.0 → 1.11.0) is correctly mirrored across marketplace.json, plugin.json, the root README.md table, and the plugin CHANGELOG.md under a properly formatted ### Added entry. Subagent type references resolve against the installed bitwarden-tech-lead and bitwarden-security-engineer plugins, and the allowed-tools declaration follows YAML conventions used by sibling skills.

Code Review Details

No new findings. Previously raised review comments (changelog cruft, README typo, AskUserQuestion missing from allowed-tools, prereq-check markdown indentation, changelog ### Added heading) are addressed and marked resolved. The open pluginRoot discussion on marketplace.json:10 and the resolved README.md link-target thread are already tracked and do not need duplication here.

Comment thread plugins/bitwarden-code-review/CHANGELOG.md Outdated
Comment thread plugins/bitwarden-code-review/README.md Outdated
@theMickster theMickster added the ai-review-vnext Request a Claude code review using the vNext workflow label May 4, 2026
Comment thread plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md Outdated
@theMickster theMickster removed ai-review Request a Claude code review ai-review-vnext Request a Claude code review using the vNext workflow labels May 4, 2026
@theMickster theMickster added the ai-review Request a Claude code review label May 4, 2026
@theMickster
Copy link
Copy Markdown
Contributor Author

Note: I am intentionally putting the actions/workflows/review-code.yml through it's paces. I am testing my recent change that triggers the workflow via applying the ai-review label.

Comment thread plugins/bitwarden-code-review/CHANGELOG.md
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Comment thread plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md Outdated
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-review Request a Claude code review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant