Skip to content

CI: totem-lint SARIF uploads silently rejected since 2026-04-06 — unpaired-surrogate escapes in frozen rule totem/5afaf8d03f059a41 #2296

Description

@satur8d

Symptom

Every totem-lint SARIF upload to GitHub Code Scanning has been silently rejected since 2026-04-06. GitHub's processing fails with:

Invalid SARIF document: unexpected end of hex escape at line 1 column 126526

Nothing surfaces: the Upload SARIF to GitHub Security step reports success (the rejection happens in GitHub's asynchronous processing, and the step is continue-on-error: true per #1226), the Lint job is green, and the error exists only inside the step's log group. It was exposed only because CodeRabbit scrapes check-run annotations into its review summaries — the same quote appeared on #2294 and #2295 at the identical byte column, which is what proved it diff-independent.

Last successful totem-lint analysis (gh api repos/mmnto-ai/totem/code-scanning/analyses): 2026-04-06T03:15:32Z. The audit-trail category the lint.yml comment says is "kept ON deliberately" (the #474 false-positive measurement trail) has been dark for ~3 months.

Root cause

A frozen compiled rule — totem/5afaf8d03f059a41 (emoji-in-markdown detector, category architecture, fileGlobs **/*.md, **/*.mdx) — carries a regex pattern whose character class encodes astral ranges as UTF-16 surrogate-pair ranges, leaving unpaired surrogates in the pattern string (e.g. …\uD83C-…-\uDFFF… split around -).

The failure chain:

  1. totem lint --format sarif embeds the raw pattern string in the rule's properties.pattern. As a JS string this is valid (lone surrogates are legal in UTF-16 strings); our own emitted file is valid JSON (verified locally with JSON.parse).
  2. github/codeql-action/upload-sarif re-serializes the document (fingerprint injection) with well-formed JSON.stringify, which emits lone surrogates as bare \ud83c-style escapes on a single line.
  3. GitHub's SARIF ingestion parser reads the high-surrogate escape, expects the low-surrogate continuation, hits the - range character instead → unexpected end of hex escape. Document rejected; no analysis recorded.
  4. continue-on-error: true + processing-failure-not-step-failure ⇒ fully silent.

The column is constant across PRs because it falls inside the serialized rules array (387 frozen rules — byte-identical prefix across runs); results (diff-dependent) serialize after.

Reproduced locally: generate the SARIF (node packages/cli/dist/index.js lint --format sarif --out …), JSON.stringify(JSON.parse(file)), inspect around column 126526 → the surrogate-range pattern sits exactly there.

Timing corroboration: the rule's mint window matches the 2026-04-06 last-good analysis.

Why the rule itself is NOT broken

JS regex without the u flag matches by UTF-16 code unit — surrogate-range classes function as authored. Enforcement is unaffected. Only the SARIF export path chokes, so the fix is serialization-side and does not require recompiling frozen rules (compile freeze unaffected; kin to #2055/#2289, not blocked by them).

Fix direction

  1. SARIF serializer (lint --format sarif): sanitize rule-metadata strings to well-formed Unicode before embedding — String.prototype.toWellFormed() (Node ≥ 20) on pattern/description fields, or replace unpaired surrogates with . The pattern property is documentation in the SARIF context, not executable, so lossy well-forming is safe.
  2. Make future rejections visible: the upload step should poll processing status (codeql-action supports wait-for-processing: true — verify whether it is on and whether its failure is eaten by continue-on-error), or a follow-up step should assert a new analysis landed for the totem-lint category. Three months of silent darkness is the actual defect; the surrogate is just today's instance.
  3. Verify: post-fix, gh api "repos/mmnto-ai/totem/code-scanning/analyses?per_page=100" shows a fresh totem-lint analysis.

Relations

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions