Skip to content

Commit 8132efa

Browse files
Carlos Scheideggerclaude
andcommitted
feat: math-mode rendering for HTML output via MathJax + KaTeX (bd-w5ov)
Auto-typeset math equations in rendered HTML through a CDN-served MathJax (default) or KaTeX engine, matching Quarto 1's user-facing `html-math-method:` surface but routed through q2's own pipeline (q2 does not invoke Pandoc). Predicate is content-driven: a new MathJsStage walks the post-transform AST and, when at least one `Inline::Math` node is found (including inside `CustomNode("Equation")` slots produced by EquationLabelTransform for `$$…$$ {#eq-foo}` syntax), populates `meta.math` with the engine's inline config block plus the loader `<script>` tag. A new `$math$` template slot in the built-in HTML templates renders this verbatim into `<head>` immediately before `$for(scripts)$`, so the inline config block always lands BEFORE the loader (what MathJax requires). Engine and URL are read from `html-math-method:`. Both bare-string form (`html-math-method: katex`) and Pandoc-compatible object form (`html-math-method: { method: mathjax, url: "..." }`) are honored. URL override skips the default CDN entirely. The `\tag{N}` injection that CrossrefRenderTransform performs for labelled equations works end-to-end; live Chromium smoke confirmed equation numbering and crossref linking through MathJax 3.2.2 from jsDelivr. CDN-default mirrors Pandoc and Quarto 1 — neither vendors MathJax bytes today, and shipping a ~10-15 MB component bundle in the q2 binary was rejected on size grounds. A future `quarto install mathjax` helper for offline rendering is filed as bd-hva0. Hub-client / WASM: stage is wired into `build_wasm_html_pipeline()` but math does NOT typeset in the preview today because the iframe sandbox excludes `allow-scripts` (deliberately, for safety against user-pasted scripts in qmd). Discovery + rationale recorded in plan §4.6; tracked as bd-but3, blocked on the existing service-worker iframe-isolation work. Tests: 11 new unit tests in math_js.rs (predicate matrix + config-parser cases) + 7 integration tests in math_mode_pipeline.rs (mathjax default / display / labelled / katex / custom URL / math-free / multi-page website). Pipeline-structure tests updated (native: 18→19 stages, WASM: 16→17). Workspace 8403/8403 pass; full `cargo xtask verify` (9 steps including hub-client + WASM) green. Plan: claude-notes/plans/2026-05-04-math-mode.md Predecessor: bd-4eyf (Bootstrap JS injection — established the predicate-driven stage pattern this builds on). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent eb4181c commit 8132efa

8 files changed

Lines changed: 2390 additions & 9 deletions

File tree

Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,181 @@
1+
# Math-mode (MathJax / KaTeX / …) implementation — handoff from bd-4eyf
2+
3+
**Status:** Not started. Notes for the next session.
4+
**Predecessor work:** bd-4eyf (Bootstrap JS injection) — see
5+
`claude-notes/plans/2026-05-04-bootstrap-js-injection.md`.
6+
7+
This document captures findings from the bd-4eyf session that are
8+
directly load-bearing for math-mode work. Read this first; it will
9+
let you skip the learning loop bd-4eyf went through.
10+
11+
## TL;DR — what changes vs. the Bootstrap JS approach
12+
13+
Quarto 1 delegates math-mode injection entirely to Pandoc by setting
14+
`html-math-method: mathjax` (or `katex`, `webtex`, …) in pandoc
15+
options. q2 cannot reuse that mechanism because **q2's HTML pipeline
16+
does not invoke Pandoc** — we render HTML ourselves. So q2 needs its
17+
own injection path.
18+
19+
The bd-4eyf "predicate → register `js:*` artifact" pattern *almost*
20+
fits, but math has two complications Bootstrap didn't:
21+
22+
1. **An inline configuration block** is required (e.g. `<script>window.MathJax = { ... }</script>`) before the loader script. The artifact pipeline only emits external `<script src="…">` tags — it has no path for inline content.
23+
2. **A trigger that depends on document content**, not just metadata. "Does this document contain math?" requires an AST walk; bd-4eyf's predicate (`is_minimal_html` + `theme_config.suppress_bootstrap`) is metadata-only and runs in microseconds.
24+
25+
These two together are why bd-4eyf deferred the generic `JsFeature`
26+
abstraction — math is the case where it might actually pay for itself.
27+
28+
## Architectural pieces you'll touch
29+
30+
Read these in roughly this order before designing:
31+
32+
- **`crates/quarto-core/src/stage/stages/bootstrap_js.rs`** — the prototype to copy. The "predicate → `Project`-scoped `js:` artifact" pattern is documented in its module-level doc comment.
33+
- **`crates/quarto-core/src/stage/stages/apply_template.rs:166-167, 313`**`collect_artifact_urls` is what turns `js:*` artifacts into `<script src="…">` tags. **It only handles external scripts** (artifacts with a `path`); inline content has no slot here. This is the design pinch-point for math.
34+
- **`crates/quarto-core/src/stage/stages/include_resolve.rs`** + the `rendered.includes.{header, before-body, after-body}` contract — *this* is how raw HTML (including inline `<script>` blocks) currently reaches the rendered template. The MathJax config script may need to ride this rail rather than the artifact rail.
35+
- **`crates/quarto-core/src/pipeline.rs`**`build_html_pipeline_stages_with_options()` (native) and `build_wasm_html_pipeline()` (hub-client). The `#[cfg(not(target_arch = "wasm32"))]` gate pattern bd-4eyf established for omitting native-only stages from WASM is the cleanest precedent. Decide upfront whether math should ship to hub-client (probably yes — math display does not have the iframe-reinit-stateful-component problem Bootstrap does).
36+
- **`crates/quarto-core/src/format.rs:278`**`is_minimal_html` predicate. **Important gotcha** documented in `bootstrap_js.rs`: this reads root-level `theme:` only; format-nested `format.html.theme: none` is *not* flattened to root by `MetadataMergeStage`. Use `quarto_sass::ThemeConfig::from_config_value(&doc.ast.meta).suppress_bootstrap` for the canonical "Bootstrap is in use" check, or *combine* both predicates as bd-4eyf did. Math probably wants its own predicate, but if it ever depends on the theme decision, use the same combined approach.
37+
- **`resources/scss/README.md`** + `resources/js/README.md` — the vendoring conventions. If you vendor MathJax, mirror the layout under `resources/js/mathjax/` and document the source URL + version + bump policy.
38+
39+
## The trigger question (this is the hard one)
40+
41+
Bootstrap JS triggers on metadata: "is a Bootstrap-backed theme
42+
active?" — checkable in O(1) on the format/document meta.
43+
44+
Math triggers on **content**: "does this document contain at least one
45+
`Math` element in the AST?" — requires a walk. Options:
46+
47+
1. **Walk in a dedicated stage** that runs late enough to see the final AST (after engines / sugaring / transforms — those can introduce math via crossref equations). This is the pure approach but adds an O(N) AST walk to every render.
48+
2. **Piggyback on an existing walker.** `RenderHtmlBodyStage` already traverses every node to emit HTML. Set a flag on `StageContext` when it sees a `Math` element. Then a tiny stage between `RenderHtmlBodyStage` and `ApplyTemplateStage` reads the flag and registers artifacts. Cheaper. Tighter coupling — the renderer becomes responsible for a side-channel signal.
49+
3. **Use the document profile.** `DocumentProfileStage` already snapshots the AST at the checkpoint; `EquationLabelTransform` already counts equation labels (`crates/quarto-core/src/transforms/`). If `profile.has_math` (new field) gets set during profiling, math injection becomes a metadata-style predicate again. Cleanest if the profile contract can be extended cheaply.
50+
51+
Recommendation: option 3 if `DocumentProfile` already sees post-sugar
52+
crossref equations; otherwise option 2. Option 1 is the most
53+
expensive and probably unnecessary.
54+
55+
**Don't forget:** `EquationLabelTransform` introduces `Math` blocks
56+
*from* `$$ … $$ {#eq-…}` syntax during sugaring. The trigger walk must
57+
run *after* this transform, or it will miss labelled equations.
58+
59+
## The inline config-script question
60+
61+
MathJax wants something like:
62+
63+
```html
64+
<script>
65+
window.MathJax = {
66+
tex: { inlineMath: [['$', '$'], ['\\(', '\\)']] },
67+
//
68+
};
69+
</script>
70+
<script src="…/mathjax.js" defer></script>
71+
```
72+
73+
The artifact pipeline can emit the external `<script>` (`js:mathjax`
74+
artifact, same shape as `js:bootstrap`) but **not the inline config
75+
block**. Three plausible paths:
76+
77+
1. **Use `rendered.includes.header`** for the inline block — same rail `IncludeResolveStage` uses for raw HTML. The math stage would push a string into `meta.rendered.includes.header` and the artifact handles only the external script. Two halves, same destination (`<head>`), but they're decoupled in the code.
78+
2. **Extend the artifact API** with an "inline" variant — `Artifact::inline_script(content)` that emits a `<script>…</script>` directly rather than `<script src="…">`. Touches the `collect_artifact_urls` contract. Probably the most invasive option.
79+
3. **Bake the config into the loader.** Vendor a small wrapper script (`mathjax-init.js`) that does `window.MathJax = {...}; document.write('<script src="mathjax.js">…')` or similar. Single artifact, no template-side change, but harder to make the config user-controllable.
80+
81+
Recommendation: option 1 (`rendered.includes.header` for inline,
82+
`js:mathjax` artifact for external). It reuses two existing rails
83+
without inventing a third. The "two halves" criticism is worth
84+
~5 lines of doc, not a refactor.
85+
86+
## Vendor vs CDN vs both?
87+
88+
Bootstrap was tiny (80 KB), so vendoring was an easy call. MathJax
89+
is *much* bigger:
90+
91+
- Full MathJax 3 distribution: ~70 MB unpacked (includes every font / output mode / extension).
92+
- Common components-only loader: ~1 MB.
93+
- Smallest bootstrap-loader: ~150 KB.
94+
95+
Quarto 1 vendors the components-only build. The size delta vs Bootstrap
96+
is real — vendoring 1 MB into the CLI binary affects download size and
97+
fresh-clone build time. Decision worth recording up front; both
98+
extremes have precedent.
99+
100+
If you go CDN-default with vendor-fallback, that's a third design
101+
question you didn't have to answer for Bootstrap. The
102+
`is_minimal_html`-style metadata knob (`mathjax.source: cdn|local`)
103+
is the user-facing surface.
104+
105+
## Hub-client / WASM
106+
107+
Unlike Bootstrap (deliberately omitted from WASM because the
108+
iframe-per-render preview blows away stateful Bootstrap components),
109+
math display is **stateless** — typeset on load, done. Hub-client
110+
*should* render math. Don't blindly copy bd-4eyf's
111+
`#[cfg(not(target_arch = "wasm32"))]` gate; for math, both pipelines
112+
get the stage. (Confirm by trying a dollar-sign-equation in
113+
hub-client preview before committing.)
114+
115+
If the size of the vendored MathJax bundle is what's blocking the
116+
WASM bundle, the CDN-default path solves that for free.
117+
118+
## Configuration knobs (scope decision)
119+
120+
Quarto 1 supports `html-math-method: mathjax | katex | webtex | gladtex | mathml | plain`.
121+
Each has different semantics:
122+
123+
- `mathjax`, `katex` — client-side typesetting, ship a JS runtime.
124+
- `webtex` — server-side image rendering, no JS.
125+
- `gladtex` — alt-text only, no JS.
126+
- `mathml` — pass-through, no JS.
127+
128+
q2 does not need to ship all of these on day one. Pick a default
129+
(probably `mathjax`) and an explicit user-controllable knob; defer the
130+
others. **Decide before you start writing code** — the predicate matrix
131+
balloons fast if you support all five.
132+
133+
## Test strategy
134+
135+
The bd-4eyf test pattern to copy:
136+
137+
- **Unit tests** for the trigger predicate (math present / absent / nested in callout / inside code block / etc.) in the new stage's `#[cfg(test)] mod tests`.
138+
- **Integration tests** in a new `crates/quarto-core/tests/math_mode_pipeline.rs` driving `render_to_file` + `ProjectPipeline`, asserting:
139+
- Math-bearing render emits the script tag(s) and the on-disk file(s).
140+
- Math-free render emits neither.
141+
- Multi-page website ships one shared copy.
142+
- Nested-page relative URL.
143+
- **Live browser smoke** via chrome-devtools-mcp — actually load a rendered page and assert MathJax typeset something visible (e.g. that a `$x^2$` body produced a `<mjx-container>` in the DOM). The bd-4eyf session proved this works and is the most decisive evidence the runtime is wired up. Don't skip it.
144+
145+
## Snapshot baseline
146+
147+
`crates/quarto-core/tests/fixtures/phase5-single-doc-baseline/expected_hashes.txt` was re-captured for bd-4eyf because the new
148+
`<script>` tag changed `doc.html`'s SHA. The baseline `doc.qmd` is
149+
math-free, so math-mode work should *not* change `doc.html` again —
150+
the math stage must skip on math-free input. If the hash shifts, the
151+
predicate is over-triggering. (Useful canary.)
152+
153+
## bd-telo dependency
154+
155+
`bd-telo` (filed during bd-4eyf): q2 today only reads `navbar:` from
156+
the top level of `_quarto.yml`, not from nested `website.navbar:`. If
157+
math-mode is documented for users in user-facing docs that recommend
158+
the natural `website.navbar:` shape, the docs will not match what
159+
works. Either land bd-telo first, or be careful about the YAML shape
160+
in user-facing docs/examples for math.
161+
162+
## Things that *don't* need re-deriving
163+
164+
bd-4eyf already settled these — don't re-design them:
165+
166+
- **Artifact-based external scripts** are the right shape for the JS payload (uses `ApplyTemplateStage`'s existing `js:` collector, gets the right URL via `ResourceResolverContext`, lands in the project lib dir for websites with the `quarto/` namespace, lands per-page for single-doc).
167+
- **Project scope** is the right scope for math assets (shared across pages in a website, mirrors the theme CSS layout).
168+
- **Vendoring layout under `resources/js/<feature>/`** with an `include_bytes!` is the established pattern; just add a section to `resources/js/README.md`.
169+
- **TDD with a noop stub for red phase** is the local norm; CLAUDE.md mandates it. The failure messages bd-4eyf used (positive cases fail, skip cases pass-via-false-positive) are documented in the bd-4eyf plan.
170+
- **Script ordering** is alphabetic-by-key and there's no SAT solver coming. If math needs to load *after* Bootstrap (it doesn't), pick a key that sorts after `bootstrap`.
171+
172+
## Open question for the next session's first message
173+
174+
Before designing, get an answer on:
175+
176+
> Do we want math-mode to support multiple engines (mathjax + katex
177+
> at minimum), or is mathjax-only an acceptable v1?
178+
179+
This single decision shapes the whole stage — single-feature stage vs.
180+
parameterized-by-engine stage vs. one-stage-per-engine. Don't start
181+
without it.

0 commit comments

Comments
 (0)