Skip to content

feat: add sensitive-data redaction before translation requests#31

Merged
adinschmidt merged 7 commits intomasterfrom
feature/sensitive-data-redaction
Mar 10, 2026
Merged

feat: add sensitive-data redaction before translation requests#31
adinschmidt merged 7 commits intomasterfrom
feature/sensitive-data-redaction

Conversation

@adinschmidt
Copy link
Owner

Summary

  • Adds a privacy layer that detects and redacts sensitive data (phone numbers, US SSNs, Canadian SINs, email addresses) before text is sent to any translation provider API
  • Blocks translation entirely when the user's selection overlaps browser password fields or sensitive form inputs (e.g., autocomplete="current-password", cc-number)
  • New src/shared/sensitive.ts module with regex-based pattern detection and HTML-aware redaction that preserves tag structure
  • User-facing "Redact sensitive data" toggle in the options page, on by default
  • i18n strings added for all 12 supported locales

How it works

  1. Regex patterns match phone numbers, SSNs (with IRS-rule exclusions), Canadian SINs, and emails
  2. redactSensitiveHTML() splits HTML into tag/text segments, applies patterns only to text nodes, preserves all markup
  3. Redaction is applied at the 3 orchestration functions in background.ts that feed all API call paths — no translation request can bypass it
  4. In content.ts, the translate button is hidden and extractSelectedHtml() returns null when the selection touches a sensitive form field
  5. Matches are replaced with typed markers like [REDACTED:PHONE] so the LLM knows content was intentionally removed

Test plan

  • bun run build succeeds with no errors
  • Load extension → Options → verify "Redact sensitive data" toggle is visible and defaults to ON
  • Select text containing a phone number (e.g., "Call 555-123-4567") → translate → verify console shows redaction log and API receives [REDACTED:PHONE]
  • Select text with SSN format (e.g., "SSN: 123-45-6789") → verify it's redacted as [REDACTED:SSN]
  • Select text with email → verify redacted as [REDACTED:EMAIL]
  • Navigate to a login page → select text from a password input → verify the translate button does NOT appear
  • Full-page translate on a page with phone numbers inside <b> tags → verify HTML tags preserved and numbers redacted
  • Toggle redaction OFF in settings → verify sensitive data passes through unchanged

Closes #28

Add a privacy layer that detects and redacts sensitive data (phone
numbers, SSNs, Canadian SINs, email addresses) before any text is
sent to translation provider APIs. Content from browser password
fields and sensitive form inputs is blocked from translation entirely.

- New shared module src/shared/sensitive.ts with regex-based detection
- Redaction applied at all 3 translation orchestration points in
  background.ts (getSettingsAndTranslate, getSettingsAndTranslateWithDetection,
  translateHTMLUnits)
- Content script hides translate button and blocks extraction for
  password/sensitive form fields
- User-facing toggle in options page (on by default)
- i18n strings added for all 12 locales

Closes #28
Correctness fixes:
- Remove (storage as any) casts; use typed storage.redactionMode
- SSN regex now requires at least one separator (backreference) and
  uses alphanumeric boundaries to avoid matching product codes/IDs
- SIN regex uses alphanumeric boundaries for the same reason
- Remove input[type="hidden"] from sensitive field detection — hidden
  inputs are invisible and can't be user-selected
- Narrow selectionOverlapsSensitiveField to search within the nearest
  <form> only; no longer falls back to document.body
- Add attribute-level redaction so mailto: hrefs and other sensitive
  data in tag attributes are also redacted before sending to providers
- Parse autocomplete as a token list (split on whitespace) to handle
  values like "section-checkout cc-number" per MDN spec

Simplifications:
- Extract repeated redaction-and-log block into applyRedaction() helper
- Derive CSS selector from SENSITIVE_INPUT_TYPES/AUTOCOMPLETE_VALUES
  sets programmatically so they stay in sync
- Short-circuit redactSensitiveHTML for plain text (no < characters)
- Remove PASSWORD_FIELD from SensitiveDataType union (no regex uses it)
- Fix prettier formatting on sensitive.ts and options.ts
Revert attribute redaction in redactSensitiveHTML — rewriting href
values (e.g. mailto:user@example.com → mailto:[REDACTED:EMAIL])
breaks anchor links in translated pages because the full-page
pipeline preserves and re-applies original attributes to the DOM.

Tags are now passed through unchanged with a detailed JSDoc comment
explaining the trade-off and the correct future path (redact-then-
restore) if attribute-level privacy becomes a requirement.

Also adds backreference to SIN regex for separator consistency
parity with SSN (123-456 789 mixed separators no longer match).
@adinschmidt adinschmidt merged commit 5cb035e into master Mar 10, 2026
2 checks passed
@adinschmidt adinschmidt deleted the feature/sensitive-data-redaction branch March 10, 2026 04:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add sensitive-data redaction before translation requests

1 participant