feat(SUD-1512): add ocho.i18n-parity plugin — full multi-locale scan + 5 worker tools#1915
Conversation
…+ 5 worker tools - manifest.ts: plugin definition with 5 tools, 3 UI slots, config schema - worker.ts: cheerio-based scanner across all non-EN locales × EN_BASELINE_ROUTES - per-surface extraction (meta, nav, hero, main, cta, footer, embeds) - EN likelihood scoring via stopword rate + script detection (CJK/Cyrillic/Devanagari) - missing locale files scored as 0 with still_english_flag=true - v1 report schema with summary keyed by locale (total_pages, above_threshold, avg_score, worst_pages) - scan history stored in memory keyed by generated_at timestamp - 5 tool handlers: run-scan, get-report, get-summary, get-page-detail, create-tickets - ui/index.tsx: dashboard page with filterable table, sidebar link, widget Co-Authored-By: Paperclip <noreply@paperclip.ing>
Greptile SummaryThis PR introduces the However, there are three P1 defects that need to be resolved before merge:
Additionally, per CONTRIBUTING.md, this PR is a larger/impactful change and should include a thinking path, details about why the change matters, how to verify it works, any risks, and before/after screenshots of the new UI components. Confidence Score: 4/5Not safe to merge as-is — the UI will never display scan results due to a data-shape mismatch between the worker and UI. Three P1 defects are present: (1) the data contract between worker and UI is broken so the dashboard is non-functional after scanning, (2) the pageLimit parameter is advertised but never applied, and (3) maxTickets is implemented in the worker but absent from the manifest schema. These all need fixing before this plugin can be used as intended. src/worker.ts (data handler reshape + pageLimit wiring) and src/manifest.ts (add maxTickets) need the most attention; src/ui/index.tsx types should be reconciled with the worker's V1Report shape. Important Files Changed
Prompt To Fix All With AIThis is a comment left during a code review.
Path: packages/plugins/examples/plugin-i18n-parity/src/worker.ts
Line: 541-544
Comment:
**Data contract mismatch: UI will never render scan results**
The data handler returns either `{ pages: [], summary: {}, scannedAt: null }` (no scan) or a full `V1Report` (after scan). However the `V1Report` type uses:
- `generated_at` (not `scannedAt`)
- `localization.pages` (not `pages`)
- `localization.summary` (not `summary`, and it's a `Record<string, V1LocaleSummary>` not an array)
The UI (`index.tsx`) checks `data.scannedAt` on lines 87, 127, and 233. Since `V1Report` has no `scannedAt` property, this is always `undefined`, causing the UI to permanently render the "No scan data available" empty state even after a successful scan.
Additionally, `data.pages` and `data.summary` are accessed in the UI but the actual data lives at `data.localization.pages` and `data.localization.summary` in the V1Report shape.
The data handler needs to project/reshape the V1Report into the shape the UI expects:
```ts
ctx.data.register("i18n-parity-report", async () => {
if (!latestScanKey) return { pages: [], summary: [], scannedAt: null };
const report = scanHistory.get(latestScanKey);
if (!report) return { pages: [], summary: [], scannedAt: null };
const summary = Object.entries(report.localization.summary).map(([locale, s]) => ({
locale,
pageCount: s.total_pages,
flaggedCount: s.total_pages - s.above_threshold,
averageScore: s.avg_score,
minScore: s.worst_pages[0]?.page_localization_score ?? 0,
}));
return {
scannedAt: report.generated_at,
pages: report.localization.pages.map((p) => ({
...p,
weightedScore: p.page_localization_score,
langAttr: null,
surfaces: Object.entries(p.surfaces).map(([surface, r]) => ({ surface, ...r })),
})),
summary,
};
});
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: packages/plugins/examples/plugin-i18n-parity/src/worker.ts
Line: 559-579
Comment:
**`pageLimit` parameter silently ignored**
The manifest's `run-scan` tool declares `pageLimit` as a supported parameter (manifest.ts lines 72-76), but the handler here only reads `input.locale` — `input.pageLimit` is never consumed. The `runScan()` function also has no `pageLimit` argument. Callers who pass `pageLimit` expecting a capped scan will receive a full unlimited scan with no warning.
Either remove `pageLimit` from the manifest schema, or thread it through:
```ts
const input = params as { locale?: string; pageLimit?: number };
const config = await getConfig(ctx);
if (!config.repoPath) return { error: "repoPath is not configured." };
const report = runScan(config, input.locale, input.pageLimit, ctx.logger);
```
And update `runScan` to accept and apply a `pageLimit` per locale.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: packages/plugins/examples/plugin-i18n-parity/src/manifest.ts
Line: 116-136
Comment:
**`maxTickets` parameter missing from `create-tickets` manifest schema**
The worker's `create-tickets` handler reads and applies `input.maxTickets` (worker.ts lines 694–703), but this parameter is not declared in the manifest's `parametersSchema`. AI model consumers won't know it's a valid input, and Paperclip's parameter validation may reject or ignore it silently.
```suggestion
{
name: "create-tickets",
displayName: "Create Parity Tickets",
description:
"Creates Paperclip issues for pages that fall below the minScore threshold from the most recent scan.",
parametersSchema: {
type: "object",
properties: {
minScore: {
type: "number",
description:
"Override threshold (0–1). Defaults to plugin config minScore.",
},
dryRun: {
type: "boolean",
description:
"If true, returns planned ticket list without creating issues.",
},
maxTickets: {
type: "number",
description:
"Cap on the number of tickets to create in a single call.",
},
},
},
},
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: packages/plugins/examples/plugin-i18n-parity/src/worker.ts
Line: 134-136
Comment:
**In-memory state lost on worker restart with no user-facing warning**
`scanHistory`, `latestScanKey`, and `cachedCompanyId` are module-level variables. Any worker restart (deploy, crash, idle timeout) silently wipes all scan history, causing all subsequent `get-report`, `get-summary`, `get-page-detail`, and `create-tickets` calls to return "No scan report available." The user will only discover this by running `run-scan` again.
Consider at minimum logging a warning on setup that state is volatile, or surfacing `scannedAt` in `onHealth()` so operators can detect a cold worker. A future improvement would be persisting to `ctx.kv` or similar storage.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "feat(SUD-1512): add ocho.i18n-parity plu..." | Re-trigger Greptile |
| ctx.data.register("i18n-parity-report", async () => { | ||
| if (!latestScanKey) return { pages: [], summary: {}, scannedAt: null }; | ||
| return scanHistory.get(latestScanKey) ?? { pages: [], summary: {}, scannedAt: null }; | ||
| }); |
There was a problem hiding this comment.
Data contract mismatch: UI will never render scan results
The data handler returns either { pages: [], summary: {}, scannedAt: null } (no scan) or a full V1Report (after scan). However the V1Report type uses:
generated_at(notscannedAt)localization.pages(notpages)localization.summary(notsummary, and it's aRecord<string, V1LocaleSummary>not an array)
The UI (index.tsx) checks data.scannedAt on lines 87, 127, and 233. Since V1Report has no scannedAt property, this is always undefined, causing the UI to permanently render the "No scan data available" empty state even after a successful scan.
Additionally, data.pages and data.summary are accessed in the UI but the actual data lives at data.localization.pages and data.localization.summary in the V1Report shape.
The data handler needs to project/reshape the V1Report into the shape the UI expects:
ctx.data.register("i18n-parity-report", async () => {
if (!latestScanKey) return { pages: [], summary: [], scannedAt: null };
const report = scanHistory.get(latestScanKey);
if (!report) return { pages: [], summary: [], scannedAt: null };
const summary = Object.entries(report.localization.summary).map(([locale, s]) => ({
locale,
pageCount: s.total_pages,
flaggedCount: s.total_pages - s.above_threshold,
averageScore: s.avg_score,
minScore: s.worst_pages[0]?.page_localization_score ?? 0,
}));
return {
scannedAt: report.generated_at,
pages: report.localization.pages.map((p) => ({
...p,
weightedScore: p.page_localization_score,
langAttr: null,
surfaces: Object.entries(p.surfaces).map(([surface, r]) => ({ surface, ...r })),
})),
summary,
};
});Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/plugins/examples/plugin-i18n-parity/src/worker.ts
Line: 541-544
Comment:
**Data contract mismatch: UI will never render scan results**
The data handler returns either `{ pages: [], summary: {}, scannedAt: null }` (no scan) or a full `V1Report` (after scan). However the `V1Report` type uses:
- `generated_at` (not `scannedAt`)
- `localization.pages` (not `pages`)
- `localization.summary` (not `summary`, and it's a `Record<string, V1LocaleSummary>` not an array)
The UI (`index.tsx`) checks `data.scannedAt` on lines 87, 127, and 233. Since `V1Report` has no `scannedAt` property, this is always `undefined`, causing the UI to permanently render the "No scan data available" empty state even after a successful scan.
Additionally, `data.pages` and `data.summary` are accessed in the UI but the actual data lives at `data.localization.pages` and `data.localization.summary` in the V1Report shape.
The data handler needs to project/reshape the V1Report into the shape the UI expects:
```ts
ctx.data.register("i18n-parity-report", async () => {
if (!latestScanKey) return { pages: [], summary: [], scannedAt: null };
const report = scanHistory.get(latestScanKey);
if (!report) return { pages: [], summary: [], scannedAt: null };
const summary = Object.entries(report.localization.summary).map(([locale, s]) => ({
locale,
pageCount: s.total_pages,
flaggedCount: s.total_pages - s.above_threshold,
averageScore: s.avg_score,
minScore: s.worst_pages[0]?.page_localization_score ?? 0,
}));
return {
scannedAt: report.generated_at,
pages: report.localization.pages.map((p) => ({
...p,
weightedScore: p.page_localization_score,
langAttr: null,
surfaces: Object.entries(p.surfaces).map(([surface, r]) => ({ surface, ...r })),
})),
summary,
};
});
```
How can I resolve this? If you propose a fix, please make it concise.| async (params): Promise<ToolResult> => { | ||
| try { | ||
| const input = params as { locale?: string }; | ||
| const config = await getConfig(ctx); | ||
| if (!config.repoPath) return { error: "repoPath is not configured." }; | ||
|
|
||
| const report = runScan(config, input.locale, ctx.logger); | ||
| latestScanKey = report.generated_at; | ||
| scanHistory.set(latestScanKey, report); | ||
|
|
||
| const totalPages = report.localization.pages.length; | ||
| const flagged = report.localization.pages.filter((p) => p.still_english_flag).length; | ||
| return { | ||
| content: `Scan complete. ${totalPages} pages across ${report.config.locales_scanned.length} locale(s). ${flagged} still-English pages flagged.`, | ||
| data: report, | ||
| }; | ||
| } catch (err) { | ||
| return { error: `run-scan failed: ${err instanceof Error ? err.message : String(err)}` }; | ||
| } | ||
| }, | ||
| ); |
There was a problem hiding this comment.
pageLimit parameter silently ignored
The manifest's run-scan tool declares pageLimit as a supported parameter (manifest.ts lines 72-76), but the handler here only reads input.locale — input.pageLimit is never consumed. The runScan() function also has no pageLimit argument. Callers who pass pageLimit expecting a capped scan will receive a full unlimited scan with no warning.
Either remove pageLimit from the manifest schema, or thread it through:
const input = params as { locale?: string; pageLimit?: number };
const config = await getConfig(ctx);
if (!config.repoPath) return { error: "repoPath is not configured." };
const report = runScan(config, input.locale, input.pageLimit, ctx.logger);And update runScan to accept and apply a pageLimit per locale.
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/plugins/examples/plugin-i18n-parity/src/worker.ts
Line: 559-579
Comment:
**`pageLimit` parameter silently ignored**
The manifest's `run-scan` tool declares `pageLimit` as a supported parameter (manifest.ts lines 72-76), but the handler here only reads `input.locale` — `input.pageLimit` is never consumed. The `runScan()` function also has no `pageLimit` argument. Callers who pass `pageLimit` expecting a capped scan will receive a full unlimited scan with no warning.
Either remove `pageLimit` from the manifest schema, or thread it through:
```ts
const input = params as { locale?: string; pageLimit?: number };
const config = await getConfig(ctx);
if (!config.repoPath) return { error: "repoPath is not configured." };
const report = runScan(config, input.locale, input.pageLimit, ctx.logger);
```
And update `runScan` to accept and apply a `pageLimit` per locale.
How can I resolve this? If you propose a fix, please make it concise.| { | ||
| name: "create-tickets", | ||
| displayName: "Create Parity Tickets", | ||
| description: | ||
| "Creates Paperclip issues for pages that fall below the minScore threshold from the most recent scan.", | ||
| parametersSchema: { | ||
| type: "object", | ||
| properties: { | ||
| minScore: { | ||
| type: "number", | ||
| description: | ||
| "Override threshold (0–1). Defaults to plugin config minScore.", | ||
| }, | ||
| dryRun: { | ||
| type: "boolean", | ||
| description: | ||
| "If true, returns planned ticket list without creating issues.", | ||
| }, | ||
| }, | ||
| }, | ||
| }, |
There was a problem hiding this comment.
maxTickets parameter missing from create-tickets manifest schema
The worker's create-tickets handler reads and applies input.maxTickets (worker.ts lines 694–703), but this parameter is not declared in the manifest's parametersSchema. AI model consumers won't know it's a valid input, and Paperclip's parameter validation may reject or ignore it silently.
| { | |
| name: "create-tickets", | |
| displayName: "Create Parity Tickets", | |
| description: | |
| "Creates Paperclip issues for pages that fall below the minScore threshold from the most recent scan.", | |
| parametersSchema: { | |
| type: "object", | |
| properties: { | |
| minScore: { | |
| type: "number", | |
| description: | |
| "Override threshold (0–1). Defaults to plugin config minScore.", | |
| }, | |
| dryRun: { | |
| type: "boolean", | |
| description: | |
| "If true, returns planned ticket list without creating issues.", | |
| }, | |
| }, | |
| }, | |
| }, | |
| { | |
| name: "create-tickets", | |
| displayName: "Create Parity Tickets", | |
| description: | |
| "Creates Paperclip issues for pages that fall below the minScore threshold from the most recent scan.", | |
| parametersSchema: { | |
| type: "object", | |
| properties: { | |
| minScore: { | |
| type: "number", | |
| description: | |
| "Override threshold (0–1). Defaults to plugin config minScore.", | |
| }, | |
| dryRun: { | |
| type: "boolean", | |
| description: | |
| "If true, returns planned ticket list without creating issues.", | |
| }, | |
| maxTickets: { | |
| type: "number", | |
| description: | |
| "Cap on the number of tickets to create in a single call.", | |
| }, | |
| }, | |
| }, | |
| }, |
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/plugins/examples/plugin-i18n-parity/src/manifest.ts
Line: 116-136
Comment:
**`maxTickets` parameter missing from `create-tickets` manifest schema**
The worker's `create-tickets` handler reads and applies `input.maxTickets` (worker.ts lines 694–703), but this parameter is not declared in the manifest's `parametersSchema`. AI model consumers won't know it's a valid input, and Paperclip's parameter validation may reject or ignore it silently.
```suggestion
{
name: "create-tickets",
displayName: "Create Parity Tickets",
description:
"Creates Paperclip issues for pages that fall below the minScore threshold from the most recent scan.",
parametersSchema: {
type: "object",
properties: {
minScore: {
type: "number",
description:
"Override threshold (0–1). Defaults to plugin config minScore.",
},
dryRun: {
type: "boolean",
description:
"If true, returns planned ticket list without creating issues.",
},
maxTickets: {
type: "number",
description:
"Cap on the number of tickets to create in a single call.",
},
},
},
},
```
How can I resolve this? If you propose a fix, please make it concise.| const scanHistory = new Map<string, V1Report>(); | ||
| let latestScanKey: string | null = null; | ||
| let cachedCompanyId: string | null = null; |
There was a problem hiding this comment.
In-memory state lost on worker restart with no user-facing warning
scanHistory, latestScanKey, and cachedCompanyId are module-level variables. Any worker restart (deploy, crash, idle timeout) silently wipes all scan history, causing all subsequent get-report, get-summary, get-page-detail, and create-tickets calls to return "No scan report available." The user will only discover this by running run-scan again.
Consider at minimum logging a warning on setup that state is volatile, or surfacing scannedAt in onHealth() so operators can detect a cold worker. A future improvement would be persisting to ctx.kv or similar storage.
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/plugins/examples/plugin-i18n-parity/src/worker.ts
Line: 134-136
Comment:
**In-memory state lost on worker restart with no user-facing warning**
`scanHistory`, `latestScanKey`, and `cachedCompanyId` are module-level variables. Any worker restart (deploy, crash, idle timeout) silently wipes all scan history, causing all subsequent `get-report`, `get-summary`, `get-page-detail`, and `create-tickets` calls to return "No scan report available." The user will only discover this by running `run-scan` again.
Consider at minimum logging a warning on setup that state is volatile, or surfacing `scannedAt` in `onHealth()` so operators can detect a cold worker. A future improvement would be persisting to `ctx.kv` or similar storage.
How can I resolve this? If you propose a fix, please make it concise.
Summary
Implements the
ocho.i18n-parityPaperclip plugin (Phase 1 + Phase 2 of SUD-1510 plan).ocho.i18n-paritywith 5 tools, 3 UI slots, config schemastill_english_flag: true,missing: true)schema_version,generated_at,config,localization.{summary, pages},analytics: null,search_console: nullgenerated_atISO stringrun-scan,get-report,get-summary,get-page-detail,create-ticketsValidation
tsc --noEmit: clean ✅pnpm build: passes (dist/manifest.js,dist/worker.js,dist/ui/index.js) ✅Issue
Closes SUD-1512 / SUD-1511 (scaffold + scanner core + full scan + tools)
🤖 Generated with Claude Code