Skip to content

feat(SUD-1512): add ocho.i18n-parity plugin — full multi-locale scan + 5 worker tools#1915

Open
achtung-ocho wants to merge 1 commit intopaperclipai:masterfrom
achtung-ocho:feat/SUD-1512-i18n-parity-plugin
Open

feat(SUD-1512): add ocho.i18n-parity plugin — full multi-locale scan + 5 worker tools#1915
achtung-ocho wants to merge 1 commit intopaperclipai:masterfrom
achtung-ocho:feat/SUD-1512-i18n-parity-plugin

Conversation

@achtung-ocho
Copy link
Copy Markdown

Summary

Implements the ocho.i18n-parity Paperclip plugin (Phase 1 + Phase 2 of SUD-1510 plan).

  • manifest.ts: plugin ocho.i18n-parity with 5 tools, 3 UI slots, config schema
  • worker.ts: cheerio-based HTML scanner across all non-EN locales × EN_BASELINE_ROUTES
    • Per-surface extraction (meta, nav, hero, main, cta, footer, embeds)
    • EN likelihood scoring via stopword rate + script detection (CJK/Cyrillic/Devanagari)
    • Missing locale files scored as 0 (still_english_flag: true, missing: true)
    • v1 report schema: schema_version, generated_at, config, localization.{summary, pages}, analytics: null, search_console: null
    • Scan history stored in-memory keyed by generated_at ISO string
  • 5 tool handlers: run-scan, get-report, get-summary, get-page-detail, create-tickets
  • ui/index.tsx: dashboard page with filterable table, sidebar link, dashboard widget

Validation

  • tsc --noEmit: clean ✅
  • pnpm build: passes (dist/manifest.js, dist/worker.js, dist/ui/index.js) ✅
  • Smoke test: v1 schema shape confirmed, 16 non-EN locales × baseline routes scanned ✅

Issue

Closes SUD-1512 / SUD-1511 (scaffold + scanner core + full scan + tools)

🤖 Generated with Claude Code

…+ 5 worker tools

- manifest.ts: plugin definition with 5 tools, 3 UI slots, config schema
- worker.ts: cheerio-based scanner across all non-EN locales × EN_BASELINE_ROUTES
  - per-surface extraction (meta, nav, hero, main, cta, footer, embeds)
  - EN likelihood scoring via stopword rate + script detection (CJK/Cyrillic/Devanagari)
  - missing locale files scored as 0 with still_english_flag=true
  - v1 report schema with summary keyed by locale (total_pages, above_threshold, avg_score, worst_pages)
  - scan history stored in memory keyed by generated_at timestamp
- 5 tool handlers: run-scan, get-report, get-summary, get-page-detail, create-tickets
- ui/index.tsx: dashboard page with filterable table, sidebar link, widget

Co-Authored-By: Paperclip <noreply@paperclip.ing>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 27, 2026

Greptile Summary

This PR introduces the ocho.i18n-parity Paperclip plugin: a cheerio-based HTML scanner that scores translation parity across all non-EN locales and 30+ baseline routes, exposes 5 agent tools, and adds a React dashboard (sidebar, full-page report, widget). The scanner logic, surface extractors, and tool handler plumbing are well-structured.

However, there are three P1 defects that need to be resolved before merge:

  • Broken UI data contract (P1): The worker's ctx.data.register handler returns a raw V1Report object (with generated_at, localization.pages, localization.summary) after a scan, but the UI reads data.scannedAt, data.pages, and data.summary — fields that don't exist on V1Report. The result is that all three UI components (I18nParitySidebar, I18nParityPage, I18nParityWidget) will always render their "No scan data" empty state even after a successful scan completes.
  • pageLimit silently ignored (P1): The run-scan manifest schema advertises a pageLimit parameter, but neither the tool handler nor runScan() read or apply it. Passing pageLimit has no effect.
  • maxTickets missing from manifest (P1): The create-tickets tool handler reads and applies input.maxTickets, but it is absent from the manifest's parametersSchema, so AI consumers cannot discover or use this parameter.

Additionally, per CONTRIBUTING.md, this PR is a larger/impactful change and should include a thinking path, details about why the change matters, how to verify it works, any risks, and before/after screenshots of the new UI components.

Confidence Score: 4/5

Not safe to merge as-is — the UI will never display scan results due to a data-shape mismatch between the worker and UI.

Three P1 defects are present: (1) the data contract between worker and UI is broken so the dashboard is non-functional after scanning, (2) the pageLimit parameter is advertised but never applied, and (3) maxTickets is implemented in the worker but absent from the manifest schema. These all need fixing before this plugin can be used as intended.

src/worker.ts (data handler reshape + pageLimit wiring) and src/manifest.ts (add maxTickets) need the most attention; src/ui/index.tsx types should be reconciled with the worker's V1Report shape.

Important Files Changed

Filename Overview
packages/plugins/examples/plugin-i18n-parity/src/worker.ts Core scanner and tool handlers; contains three P1 bugs: data handler returns V1Report shape that the UI cannot consume (broken after scan), pageLimit parameter is declared in manifest but silently ignored here, and in-memory state is volatilely stored.
packages/plugins/examples/plugin-i18n-parity/src/manifest.ts Plugin manifest with 5 tools and 3 UI slots; missing maxTickets from create-tickets parametersSchema, and pageLimit is declared for run-scan but not implemented in the worker.
packages/plugins/examples/plugin-i18n-parity/src/ui/index.tsx Dashboard UI with sidebar, page, and widget components; type definitions diverge from V1Report (uses weightedScore, langAttr, scannedAt, flat summary array) so all components will render the empty/no-data state after a real scan.
packages/plugins/examples/plugin-i18n-parity/src/index.ts Trivial barrel file re-exporting manifest and worker defaults.
packages/plugins/examples/plugin-i18n-parity/package.json Package definition with correct workspace dependencies, build scripts, and plugin metadata.
packages/plugins/examples/plugin-i18n-parity/scripts/build-ui.mjs esbuild script for bundling the UI; correctly externalises React and the plugin SDK UI module.
packages/plugins/examples/plugin-i18n-parity/tsconfig.json TypeScript config; no issues observed.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: packages/plugins/examples/plugin-i18n-parity/src/worker.ts
Line: 541-544

Comment:
**Data contract mismatch: UI will never render scan results**

The data handler returns either `{ pages: [], summary: {}, scannedAt: null }` (no scan) or a full `V1Report` (after scan). However the `V1Report` type uses:
- `generated_at` (not `scannedAt`)
- `localization.pages` (not `pages`)
- `localization.summary` (not `summary`, and it's a `Record<string, V1LocaleSummary>` not an array)

The UI (`index.tsx`) checks `data.scannedAt` on lines 87, 127, and 233. Since `V1Report` has no `scannedAt` property, this is always `undefined`, causing the UI to permanently render the "No scan data available" empty state even after a successful scan.

Additionally, `data.pages` and `data.summary` are accessed in the UI but the actual data lives at `data.localization.pages` and `data.localization.summary` in the V1Report shape.

The data handler needs to project/reshape the V1Report into the shape the UI expects:

```ts
ctx.data.register("i18n-parity-report", async () => {
  if (!latestScanKey) return { pages: [], summary: [], scannedAt: null };
  const report = scanHistory.get(latestScanKey);
  if (!report) return { pages: [], summary: [], scannedAt: null };
  const summary = Object.entries(report.localization.summary).map(([locale, s]) => ({
    locale,
    pageCount: s.total_pages,
    flaggedCount: s.total_pages - s.above_threshold,
    averageScore: s.avg_score,
    minScore: s.worst_pages[0]?.page_localization_score ?? 0,
  }));
  return {
    scannedAt: report.generated_at,
    pages: report.localization.pages.map((p) => ({
      ...p,
      weightedScore: p.page_localization_score,
      langAttr: null,
      surfaces: Object.entries(p.surfaces).map(([surface, r]) => ({ surface, ...r })),
    })),
    summary,
  };
});
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: packages/plugins/examples/plugin-i18n-parity/src/worker.ts
Line: 559-579

Comment:
**`pageLimit` parameter silently ignored**

The manifest's `run-scan` tool declares `pageLimit` as a supported parameter (manifest.ts lines 72-76), but the handler here only reads `input.locale``input.pageLimit` is never consumed. The `runScan()` function also has no `pageLimit` argument. Callers who pass `pageLimit` expecting a capped scan will receive a full unlimited scan with no warning.

Either remove `pageLimit` from the manifest schema, or thread it through:

```ts
const input = params as { locale?: string; pageLimit?: number };
const config = await getConfig(ctx);
if (!config.repoPath) return { error: "repoPath is not configured." };

const report = runScan(config, input.locale, input.pageLimit, ctx.logger);
```

And update `runScan` to accept and apply a `pageLimit` per locale.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: packages/plugins/examples/plugin-i18n-parity/src/manifest.ts
Line: 116-136

Comment:
**`maxTickets` parameter missing from `create-tickets` manifest schema**

The worker's `create-tickets` handler reads and applies `input.maxTickets` (worker.ts lines 694–703), but this parameter is not declared in the manifest's `parametersSchema`. AI model consumers won't know it's a valid input, and Paperclip's parameter validation may reject or ignore it silently.

```suggestion
    {
      name: "create-tickets",
      displayName: "Create Parity Tickets",
      description:
        "Creates Paperclip issues for pages that fall below the minScore threshold from the most recent scan.",
      parametersSchema: {
        type: "object",
        properties: {
          minScore: {
            type: "number",
            description:
              "Override threshold (0–1). Defaults to plugin config minScore.",
          },
          dryRun: {
            type: "boolean",
            description:
              "If true, returns planned ticket list without creating issues.",
          },
          maxTickets: {
            type: "number",
            description:
              "Cap on the number of tickets to create in a single call.",
          },
        },
      },
    },
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: packages/plugins/examples/plugin-i18n-parity/src/worker.ts
Line: 134-136

Comment:
**In-memory state lost on worker restart with no user-facing warning**

`scanHistory`, `latestScanKey`, and `cachedCompanyId` are module-level variables. Any worker restart (deploy, crash, idle timeout) silently wipes all scan history, causing all subsequent `get-report`, `get-summary`, `get-page-detail`, and `create-tickets` calls to return "No scan report available." The user will only discover this by running `run-scan` again.

Consider at minimum logging a warning on setup that state is volatile, or surfacing `scannedAt` in `onHealth()` so operators can detect a cold worker. A future improvement would be persisting to `ctx.kv` or similar storage.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "feat(SUD-1512): add ocho.i18n-parity plu..." | Re-trigger Greptile

Comment on lines +541 to +544
ctx.data.register("i18n-parity-report", async () => {
if (!latestScanKey) return { pages: [], summary: {}, scannedAt: null };
return scanHistory.get(latestScanKey) ?? { pages: [], summary: {}, scannedAt: null };
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Data contract mismatch: UI will never render scan results

The data handler returns either { pages: [], summary: {}, scannedAt: null } (no scan) or a full V1Report (after scan). However the V1Report type uses:

  • generated_at (not scannedAt)
  • localization.pages (not pages)
  • localization.summary (not summary, and it's a Record<string, V1LocaleSummary> not an array)

The UI (index.tsx) checks data.scannedAt on lines 87, 127, and 233. Since V1Report has no scannedAt property, this is always undefined, causing the UI to permanently render the "No scan data available" empty state even after a successful scan.

Additionally, data.pages and data.summary are accessed in the UI but the actual data lives at data.localization.pages and data.localization.summary in the V1Report shape.

The data handler needs to project/reshape the V1Report into the shape the UI expects:

ctx.data.register("i18n-parity-report", async () => {
  if (!latestScanKey) return { pages: [], summary: [], scannedAt: null };
  const report = scanHistory.get(latestScanKey);
  if (!report) return { pages: [], summary: [], scannedAt: null };
  const summary = Object.entries(report.localization.summary).map(([locale, s]) => ({
    locale,
    pageCount: s.total_pages,
    flaggedCount: s.total_pages - s.above_threshold,
    averageScore: s.avg_score,
    minScore: s.worst_pages[0]?.page_localization_score ?? 0,
  }));
  return {
    scannedAt: report.generated_at,
    pages: report.localization.pages.map((p) => ({
      ...p,
      weightedScore: p.page_localization_score,
      langAttr: null,
      surfaces: Object.entries(p.surfaces).map(([surface, r]) => ({ surface, ...r })),
    })),
    summary,
  };
});
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/plugins/examples/plugin-i18n-parity/src/worker.ts
Line: 541-544

Comment:
**Data contract mismatch: UI will never render scan results**

The data handler returns either `{ pages: [], summary: {}, scannedAt: null }` (no scan) or a full `V1Report` (after scan). However the `V1Report` type uses:
- `generated_at` (not `scannedAt`)
- `localization.pages` (not `pages`)
- `localization.summary` (not `summary`, and it's a `Record<string, V1LocaleSummary>` not an array)

The UI (`index.tsx`) checks `data.scannedAt` on lines 87, 127, and 233. Since `V1Report` has no `scannedAt` property, this is always `undefined`, causing the UI to permanently render the "No scan data available" empty state even after a successful scan.

Additionally, `data.pages` and `data.summary` are accessed in the UI but the actual data lives at `data.localization.pages` and `data.localization.summary` in the V1Report shape.

The data handler needs to project/reshape the V1Report into the shape the UI expects:

```ts
ctx.data.register("i18n-parity-report", async () => {
  if (!latestScanKey) return { pages: [], summary: [], scannedAt: null };
  const report = scanHistory.get(latestScanKey);
  if (!report) return { pages: [], summary: [], scannedAt: null };
  const summary = Object.entries(report.localization.summary).map(([locale, s]) => ({
    locale,
    pageCount: s.total_pages,
    flaggedCount: s.total_pages - s.above_threshold,
    averageScore: s.avg_score,
    minScore: s.worst_pages[0]?.page_localization_score ?? 0,
  }));
  return {
    scannedAt: report.generated_at,
    pages: report.localization.pages.map((p) => ({
      ...p,
      weightedScore: p.page_localization_score,
      langAttr: null,
      surfaces: Object.entries(p.surfaces).map(([surface, r]) => ({ surface, ...r })),
    })),
    summary,
  };
});
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +559 to +579
async (params): Promise<ToolResult> => {
try {
const input = params as { locale?: string };
const config = await getConfig(ctx);
if (!config.repoPath) return { error: "repoPath is not configured." };

const report = runScan(config, input.locale, ctx.logger);
latestScanKey = report.generated_at;
scanHistory.set(latestScanKey, report);

const totalPages = report.localization.pages.length;
const flagged = report.localization.pages.filter((p) => p.still_english_flag).length;
return {
content: `Scan complete. ${totalPages} pages across ${report.config.locales_scanned.length} locale(s). ${flagged} still-English pages flagged.`,
data: report,
};
} catch (err) {
return { error: `run-scan failed: ${err instanceof Error ? err.message : String(err)}` };
}
},
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 pageLimit parameter silently ignored

The manifest's run-scan tool declares pageLimit as a supported parameter (manifest.ts lines 72-76), but the handler here only reads input.localeinput.pageLimit is never consumed. The runScan() function also has no pageLimit argument. Callers who pass pageLimit expecting a capped scan will receive a full unlimited scan with no warning.

Either remove pageLimit from the manifest schema, or thread it through:

const input = params as { locale?: string; pageLimit?: number };
const config = await getConfig(ctx);
if (!config.repoPath) return { error: "repoPath is not configured." };

const report = runScan(config, input.locale, input.pageLimit, ctx.logger);

And update runScan to accept and apply a pageLimit per locale.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/plugins/examples/plugin-i18n-parity/src/worker.ts
Line: 559-579

Comment:
**`pageLimit` parameter silently ignored**

The manifest's `run-scan` tool declares `pageLimit` as a supported parameter (manifest.ts lines 72-76), but the handler here only reads `input.locale``input.pageLimit` is never consumed. The `runScan()` function also has no `pageLimit` argument. Callers who pass `pageLimit` expecting a capped scan will receive a full unlimited scan with no warning.

Either remove `pageLimit` from the manifest schema, or thread it through:

```ts
const input = params as { locale?: string; pageLimit?: number };
const config = await getConfig(ctx);
if (!config.repoPath) return { error: "repoPath is not configured." };

const report = runScan(config, input.locale, input.pageLimit, ctx.logger);
```

And update `runScan` to accept and apply a `pageLimit` per locale.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +116 to +136
{
name: "create-tickets",
displayName: "Create Parity Tickets",
description:
"Creates Paperclip issues for pages that fall below the minScore threshold from the most recent scan.",
parametersSchema: {
type: "object",
properties: {
minScore: {
type: "number",
description:
"Override threshold (0–1). Defaults to plugin config minScore.",
},
dryRun: {
type: "boolean",
description:
"If true, returns planned ticket list without creating issues.",
},
},
},
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 maxTickets parameter missing from create-tickets manifest schema

The worker's create-tickets handler reads and applies input.maxTickets (worker.ts lines 694–703), but this parameter is not declared in the manifest's parametersSchema. AI model consumers won't know it's a valid input, and Paperclip's parameter validation may reject or ignore it silently.

Suggested change
{
name: "create-tickets",
displayName: "Create Parity Tickets",
description:
"Creates Paperclip issues for pages that fall below the minScore threshold from the most recent scan.",
parametersSchema: {
type: "object",
properties: {
minScore: {
type: "number",
description:
"Override threshold (0–1). Defaults to plugin config minScore.",
},
dryRun: {
type: "boolean",
description:
"If true, returns planned ticket list without creating issues.",
},
},
},
},
{
name: "create-tickets",
displayName: "Create Parity Tickets",
description:
"Creates Paperclip issues for pages that fall below the minScore threshold from the most recent scan.",
parametersSchema: {
type: "object",
properties: {
minScore: {
type: "number",
description:
"Override threshold (0–1). Defaults to plugin config minScore.",
},
dryRun: {
type: "boolean",
description:
"If true, returns planned ticket list without creating issues.",
},
maxTickets: {
type: "number",
description:
"Cap on the number of tickets to create in a single call.",
},
},
},
},
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/plugins/examples/plugin-i18n-parity/src/manifest.ts
Line: 116-136

Comment:
**`maxTickets` parameter missing from `create-tickets` manifest schema**

The worker's `create-tickets` handler reads and applies `input.maxTickets` (worker.ts lines 694–703), but this parameter is not declared in the manifest's `parametersSchema`. AI model consumers won't know it's a valid input, and Paperclip's parameter validation may reject or ignore it silently.

```suggestion
    {
      name: "create-tickets",
      displayName: "Create Parity Tickets",
      description:
        "Creates Paperclip issues for pages that fall below the minScore threshold from the most recent scan.",
      parametersSchema: {
        type: "object",
        properties: {
          minScore: {
            type: "number",
            description:
              "Override threshold (0–1). Defaults to plugin config minScore.",
          },
          dryRun: {
            type: "boolean",
            description:
              "If true, returns planned ticket list without creating issues.",
          },
          maxTickets: {
            type: "number",
            description:
              "Cap on the number of tickets to create in a single call.",
          },
        },
      },
    },
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +134 to +136
const scanHistory = new Map<string, V1Report>();
let latestScanKey: string | null = null;
let cachedCompanyId: string | null = null;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 In-memory state lost on worker restart with no user-facing warning

scanHistory, latestScanKey, and cachedCompanyId are module-level variables. Any worker restart (deploy, crash, idle timeout) silently wipes all scan history, causing all subsequent get-report, get-summary, get-page-detail, and create-tickets calls to return "No scan report available." The user will only discover this by running run-scan again.

Consider at minimum logging a warning on setup that state is volatile, or surfacing scannedAt in onHealth() so operators can detect a cold worker. A future improvement would be persisting to ctx.kv or similar storage.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/plugins/examples/plugin-i18n-parity/src/worker.ts
Line: 134-136

Comment:
**In-memory state lost on worker restart with no user-facing warning**

`scanHistory`, `latestScanKey`, and `cachedCompanyId` are module-level variables. Any worker restart (deploy, crash, idle timeout) silently wipes all scan history, causing all subsequent `get-report`, `get-summary`, `get-page-detail`, and `create-tickets` calls to return "No scan report available." The user will only discover this by running `run-scan` again.

Consider at minimum logging a warning on setup that state is volatile, or surfacing `scannedAt` in `onHealth()` so operators can detect a cold worker. A future improvement would be persisting to `ctx.kv` or similar storage.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant