PM-33385 - Craft a multi-agent code review by theMickster · Pull Request #96 · bitwarden/ai-plugins

theMickster · 2026-04-24T13:13:13Z

🎟️ Tracking

PM-33385 - Create multi-agent code review skill

📔 Objective

Adds a new performing-multi-agent-code-review skill. Intentionally a different lens from the existing bitwarden-code-reviewer agent — that agent reviews code the way a human reviewer would; this skill has Claude evaluate code as Claude, with output constraints a human can read. The pipeline runs 6 subagents per review each with a specific purpose and well-defined prompt.

Nitty-gritty skill details

Expand

Four invocation modes: PR, local changes, branch comparison, and commit-range (e.g. "last week of commits", "last 20 commits", "since 2026-04-23").
Configurable model — opus by default; tunable via --model to dial cost vs. depth.
Pipeline: architecture compliance pass → 3 parallel diff reviewers (quality, bug, security) → per-finding validation → severity audit.
Confidence threshold ≥ 80; everything below is dropped silently.
The architecture agent is the only codebase-aware lens — the 3 parallel reviewers are deliberately constrained to the diff to avoid context contamination. The architecture agent counterweights by loading full files, sibling code, and project docs, catching pattern violations, boundary violations, and doc/code drift the diff-only reviewers cannot.
Severity-bucketed local markdown report written to the repo root.
Findings render in 🛑 Blocker / ⚠️ Important / ♻️ Refactor with a collapsed "dismissed findings" block showing what was caught and rejected, with rejection reasons.
Each finding includes a Caught by sub-agent attribution (Architecture, Code quality, Bug analysis, Security & logic, Validation).

Use cases

Expand

- At the end of a [Claude Code Agent Teams](https://code.claude.com/docs/en/agent-teams) coding session — slots into an agent feedback loop where teammates work, address findings, and refactor. - When an engineer is approaching the end of local development work and needs depth of review. - When an engineer has completed work on a draft PR and needs depth of review prior to publish. - When an engineer is peer-reviewing a high-density, cross-team, or very complex pull request.

Field test results on live PRs

Cost estimates

Expand

Cost comparison sdk-internal/pull/1020

multi-agent review

single-agent `/bitwarden-code-review:code-review-local`

Cost comparison misc/pull/348

multi-agent review

single-agent review

🧪 Testing

Expand for reviewer steps to try the four modes locally

Step 1: Install the required sibling plugins

The skill aborts without bitwarden-tech-lead and bitwarden-security-engineer available in the same marketplace. Confirm both are installed before invoking.

Step 2: Trigger each mode and confirm the report shape

In a Bitwarden repo checkout, invoke each form below and confirm a code-review-*.md lands at the repo root with severity-bucketed findings and a Caught by: line on every finding:

PR mode — /performing-multi-agent-code-review 12345 (substitute a real PR number)
Local mode — make an uncommitted change, then /perform-multi-agent-code-review
Branch comparison — on a clean feature branch, /perform-multi-agent-code-review
Commit-range — /performing-multi-agent-code-review on the last 5 commits

Step 3: Confirm the dismissed-findings block

Each report should include a collapsed "dismissed findings" section listing findings that didn't survive Step 4 validation or Step 5 severity audit, with the rejection reason.

github-actions · 2026-04-29T06:14:48Z

Checkmarx One – Scan Summary & Details – 2438330f-9677-47c0-b21d-888091cf8e04

Great job! No new security vulnerabilities introduced in this pull request

Co-authored-by: Copilot <copilot@github.com>

theMickster · 2026-05-04T07:31:35Z

  "metadata": {
    "description": "Official Bitwarden Claude Plugin Marketplace",
-    "version": "1.0.1",
-    "pluginRoot": "./plugins"


I have been experimenting with cross agent configuration (namely Github Copilot) and I found that this configuration is incorrect. It should only be used it all plugins below did not have the ./plugins/ prefix in the source property. However, since all of them do, this should be removed.

theMickster · 2026-05-04T08:04:19Z

+   - **READ** `references/discovery-standards.md`. The Hygiene Sweep is referenced by name in the Step 3 Agent 1 prompt; Line Number Accuracy is propagated verbatim into every Step 2–5 subagent prompt.
+   - **READ** `references/evaluation-standards.md`. Severity Levels, Do Not Flag, and Confidence Scoring are propagated verbatim into every Step 2–5 subagent prompt.
+
+2. Launch a single architecture & pattern compliance agent using the `bitwarden-tech-lead` subagent type. Give it the diff, the list of changed file paths, and — in PR mode only — the PR title and description.


When working locally against fellow teammate's feature branches, I have found this particular subagent to be very useful. It has pointed out where a teammate has unintentionally violated a CLAUDE.md or a README.md rule. The agents that only scan the diffs aren't finding those.

theMickster · 2026-05-04T08:11:38Z

+
+Execute these steps in order. Do not skip, reorder, or combine steps.
+
+1. Gather context (no subagents). All `references/...` paths below resolve relative to this skill's directory — do not search elsewhere.


I spent the most time working to refine and clearly tell the main agent (the orchestrator for lack of a better name) what context is must gather and pass to subagents because they work independently with their own context window.
Claude Glossary

theMickster · 2026-05-04T08:22:49Z

+
+This is not an exhaustive checklist — surface anything diff-visible that a senior engineer would flag in a real review.
+
+## Line Number Accuracy


The first few iterations of the skill saw Claude Code deliver some funky line numbers; therefore, I added this brief direct instruction to help guide it and I have seen much better results.

theMickster · 2026-05-04T08:24:32Z

As noted in the pull request description, these standards are intentionally different because of the depth of work being done by the subagents. The intention is that we don't surface noise to engineers, but real signals of problems. I feel strongly that humans should be relied upon for suggestions and nit-picks. Claude should stay in this lane IMO.

theMickster · 2026-05-04T08:26:33Z

+- Speculative issues that depend on specific inputs or runtime state without evidence those inputs occur in practice.
+- Pre-existing issues not introduced or worsened by this change.
+
+## Confidence Scoring


I have found good success with providing Claude with a scale that has concrete descriptions of what is and is not considered a confident finding.

theMickster · 2026-05-04T08:27:57Z

Arguably the best suggestion from the /skill-creator was this markdown. Given the subagents a concrete messaging format has drastically improved consistency of findings and being able to track what agent came up with what finding is also very helpful.

theMickster · 2026-05-04T08:29:18Z

Given that the skill is designed to be very flexible; keeping it in the main SKILL.md became overwhelming.

theMickster · 2026-05-04T08:32:18Z

+
+6. Merge all Step 4 and Step 5 returns by `id` into the master finding map. Creation-time fields are immutable (see `references/finding-shape.md`). For dismissed findings, set `dismissal_stage` to `"Step 4 validation"` or `"Step 5 severity audit"` based on which step set the dismissal status — it renders as `**Dismissed at:**`. Partition by final status: validated (Step 5 `confirmed` or `downgraded`) becomes the main Findings section; dismissed (Step 4 `dismissed` or Step 5 `dismissed`) preserves original severity, original confidence, dismissal stage, and dismissal reason for rendering in the Dismissed block.
+
+7. Format the report using the template in `references/report-template.md`. Cite every validated AND dismissed finding with full file path and line: `file/path.ext:{line}` (or `:{start}-{end}` for ranges). Omit any severity section with zero findings. If zero findings total, replace the Findings section with: "No findings found." For every rendered finding (validated and dismissed), populate the `**Caught by:**` line from the finding's `source_agent` field, translated to the friendly label per the table in `references/report-template.md`. Dismissed findings additionally render `**Original severity:**`, `**Original confidence:**`, `**Dismissed at:**`, and `**Dismissed because:**` per the template — past runs have silently dropped these, so do not omit any of them; per-finding traceability requires the full set.


Another clear distinction from the single agent reviews and commands. I don't want inline comments and I don't want a second file. I want to see a single, clear report with collapsed sections as needed so that engineers can work the report top-to-bottom.

Adding the Caught by and the Original.. fields were instrumental in working with the /skill-creator skill to diagnose inconsistencies and tune the subagent prompts.

theMickster · 2026-05-04T08:34:00Z

+
+<!-- Only if there are rejected findings. Omit entirely if all confirmed. -->
+
+## Reviewed and Dismissed


One of my favorite parts of the skill is this addition.
When creating the Seeder Utility in the server repo and refining the skill itself, it was very helpful to keep these around in a simple expander/collapse to better understand what was going on in the process. Must keep IMO.

github-actions · 2026-05-04T08:36:02Z

🤖 Bitwarden Claude Code Review

Overall Assessment: APPROVE

Reviewed the addition of the new performing-multi-agent-code-review skill to the bitwarden-code-review plugin. The change ships a new SKILL.md plus five references/*.md files describing modes, finding shape, evaluation standards, discovery standards, and the report template. Version bump (1.10.0 → 1.11.0) is correctly mirrored across marketplace.json, plugin.json, the root README.md table, and the plugin CHANGELOG.md under a properly formatted ### Added entry. Subagent type references resolve against the installed bitwarden-tech-lead and bitwarden-security-engineer plugins, and the allowed-tools declaration follows YAML conventions used by sibling skills.

Code Review Details

No new findings. Previously raised review comments (changelog cruft, README typo, AskUserQuestion missing from allowed-tools, prereq-check markdown indentation, changelog ### Added heading) are addressed and marked resolved. The open pluginRoot discussion on marketplace.json:10 and the resolved README.md link-target thread are already tracked and do not need duplication here.

theMickster · 2026-05-04T16:05:07Z

Note: I am intentionally putting the actions/workflows/review-code.yml through it's paces. I am testing my recent change that triggers the workflow via applying the ai-review label.

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

theMickster mentioned this pull request Apr 24, 2026

PM-34598 - Enhance review code workflow to trigger on PR publish - Labeled #97

Merged

Initial commit of the multi-agent review

50067d7

Co-authored-by: Copilot <copilot@github.com>

theMickster force-pushed the multi-agent-review branch from 985e9ed to 50067d7 Compare April 30, 2026 06:09

theMickster added 7 commits April 30, 2026 10:13

Update to use tech-lead. Expand mode capabilities. expand frontmatter

91515eb

Merge remote-tracking branch 'origin/main' into multi-agent-review

6761b2f

Rename and bump version

1f49a67

Breakup large skill into reference files

a5488f0

Remove suggestion tier all together to follow Anthropic's patterns

0b3395a

Minor changes

9bd618c

Remove pluginRoot metadata field.

90fc5c2

theMickster commented May 4, 2026

View reviewed changes

theMickster marked this pull request as ready for review May 4, 2026 08:35

theMickster requested a review from a team as a code owner May 4, 2026 08:35

theMickster added the ai-review Request a Claude code review label May 4, 2026

claude Bot reviewed May 4, 2026

View reviewed changes

Comment thread plugins/bitwarden-code-review/CHANGELOG.md Outdated

claude Bot reviewed May 4, 2026

View reviewed changes

Comment thread plugins/bitwarden-code-review/README.md Outdated

Fix minor tech debt

e7434f0

theMickster mentioned this pull request May 4, 2026

PoC: Add bitwarden-daily-recap plugin #107

Closed

8 tasks

theMickster added the ai-review-vnext Request a Claude code review using the vNext workflow label May 4, 2026

claude Bot reviewed May 4, 2026

View reviewed changes

Comment thread plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md Outdated

theMickster removed ai-review Request a Claude code review ai-review-vnext Request a Claude code review using the vNext workflow labels May 4, 2026

Apply AskUserQuestion to allowed tools. Legit ask

0959cf2

theMickster added the ai-review Request a Claude code review label May 4, 2026

theMickster added ai-review Request a Claude code review and removed ai-review Request a Claude code review labels May 4, 2026

This was referenced May 5, 2026

PM-26250 Explore options to enable direct importer for mac app store build bitwarden/clients#17479

Open

Update guidance on licensing cert bitwarden/contributing-docs#803

Merged

Local sessions using skill came up with these minor adjustments

bf13e54

claude Bot reviewed May 5, 2026

View reviewed changes

Comment thread plugins/bitwarden-code-review/CHANGELOG.md

Update CHANGELOG.md as suggested by Claude.

d3de779

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

claude Bot reviewed May 5, 2026

View reviewed changes

Comment thread plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md Outdated

Update de-indent

6dbae1b

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

This was referenced May 5, 2026

[PM-32016] - make at-risk banner dismissable bitwarden/clients#20505

Open

[PM-36419] [BEEEP] Add collection management settings to seeder bitwarden/server#7576

Merged

Merge branch 'main' into multi-agent-review

dca4251


		Execute these steps in order. Do not skip, reorder, or combine steps.

		1. Gather context (no subagents). All `references/...` paths below resolve relative to this skill's directory — do not search elsewhere.


		This is not an exhaustive checklist — surface anything diff-visible that a senior engineer would flag in a real review.

		## Line Number Accuracy


		6. Merge all Step 4 and Step 5 returns by `id` into the master finding map. Creation-time fields are immutable (see `references/finding-shape.md`). For dismissed findings, set `dismissal_stage` to `"Step 4 validation"` or `"Step 5 severity audit"` based on which step set the dismissal status — it renders as `Dismissed at:`. Partition by final status: validated (Step 5 `confirmed` or `downgraded`) becomes the main Findings section; dismissed (Step 4 `dismissed` or Step 5 `dismissed`) preserves original severity, original confidence, dismissal stage, and dismissal reason for rendering in the Dismissed block.

		7. Format the report using the template in `references/report-template.md`. Cite every validated AND dismissed finding with full file path and line: `file/path.ext:{line}` (or `:{start}-{end}` for ranges). Omit any severity section with zero findings. If zero findings total, replace the Findings section with: "No findings found." For every rendered finding (validated and dismissed), populate the `Caught by:` line from the finding's `source_agent` field, translated to the friendly label per the table in `references/report-template.md`. Dismissed findings additionally render `Original severity:`, `Original confidence:`, `Dismissed at:`, and `Dismissed because:` per the template — past runs have silently dropped these, so do not omit any of them; per-finding traceability requires the full set.


		<!-- Only if there are rejected findings. Omit entirely if all confirmed. -->

		## Reviewed and Dismissed

Conversation

theMickster commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎟️ Tracking

📔 Objective

Nitty-gritty skill details

Use cases

Field test results on live PRs

Cost estimates

Cost comparison sdk-internal/pull/1020

multi-agent review

single-agent /bitwarden-code-review:code-review-local

Cost comparison misc/pull/348

multi-agent review

single-agent review

🧪 Testing

Step 1: Install the required sibling plugins

Step 2: Trigger each mode and confirm the report shape

Step 3: Confirm the dismissed-findings block

Uh oh!

github-actions Bot commented Apr 29, 2026

Great job! No new security vulnerabilities introduced in this pull request

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 Bitwarden Claude Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

theMickster commented May 4, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

theMickster commented Apr 24, 2026 •

edited

Loading

single-agent `/bitwarden-code-review:code-review-local`

github-actions Bot commented May 4, 2026 •

edited

Loading