feat: add sensitive-data redaction before translation requests#31
Merged
adinschmidt merged 7 commits intomasterfrom Mar 10, 2026
Merged
feat: add sensitive-data redaction before translation requests#31adinschmidt merged 7 commits intomasterfrom
adinschmidt merged 7 commits intomasterfrom
Conversation
Add a privacy layer that detects and redacts sensitive data (phone numbers, SSNs, Canadian SINs, email addresses) before any text is sent to translation provider APIs. Content from browser password fields and sensitive form inputs is blocked from translation entirely. - New shared module src/shared/sensitive.ts with regex-based detection - Redaction applied at all 3 translation orchestration points in background.ts (getSettingsAndTranslate, getSettingsAndTranslateWithDetection, translateHTMLUnits) - Content script hides translate button and blocks extraction for password/sensitive form fields - User-facing toggle in options page (on by default) - i18n strings added for all 12 locales Closes #28
Correctness fixes: - Remove (storage as any) casts; use typed storage.redactionMode - SSN regex now requires at least one separator (backreference) and uses alphanumeric boundaries to avoid matching product codes/IDs - SIN regex uses alphanumeric boundaries for the same reason - Remove input[type="hidden"] from sensitive field detection — hidden inputs are invisible and can't be user-selected - Narrow selectionOverlapsSensitiveField to search within the nearest <form> only; no longer falls back to document.body - Add attribute-level redaction so mailto: hrefs and other sensitive data in tag attributes are also redacted before sending to providers - Parse autocomplete as a token list (split on whitespace) to handle values like "section-checkout cc-number" per MDN spec Simplifications: - Extract repeated redaction-and-log block into applyRedaction() helper - Derive CSS selector from SENSITIVE_INPUT_TYPES/AUTOCOMPLETE_VALUES sets programmatically so they stay in sync - Short-circuit redactSensitiveHTML for plain text (no < characters) - Remove PASSWORD_FIELD from SensitiveDataType union (no regex uses it) - Fix prettier formatting on sensitive.ts and options.ts
Revert attribute redaction in redactSensitiveHTML — rewriting href values (e.g. mailto:user@example.com → mailto:[REDACTED:EMAIL]) breaks anchor links in translated pages because the full-page pipeline preserves and re-applies original attributes to the DOM. Tags are now passed through unchanged with a detailed JSDoc comment explaining the trade-off and the correct future path (redact-then- restore) if attribute-level privacy becomes a requirement. Also adds backreference to SIN regex for separator consistency parity with SSN (123-456 789 mixed separators no longer match).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
autocomplete="current-password",cc-number)src/shared/sensitive.tsmodule with regex-based pattern detection and HTML-aware redaction that preserves tag structureHow it works
redactSensitiveHTML()splits HTML into tag/text segments, applies patterns only to text nodes, preserves all markupbackground.tsthat feed all API call paths — no translation request can bypass itcontent.ts, the translate button is hidden andextractSelectedHtml()returnsnullwhen the selection touches a sensitive form field[REDACTED:PHONE]so the LLM knows content was intentionally removedTest plan
bun run buildsucceeds with no errors[REDACTED:PHONE][REDACTED:SSN][REDACTED:EMAIL]<b>tags → verify HTML tags preserved and numbers redactedCloses #28