Ensure sanitizing algorithms can be implemented efficiently #259

smaug---- · 2025-02-24T21:18:07Z

This is a rather vague issue, but it isn't immediately clear how much overhead sanitizing algorithms will add to normal parsing.
innerHTML setter implementations are highly optimized in browsers, and would be good if setHTML could be too (as much as possible).
I wouldn't be surprised if setHTML implementations are first a bit less optimized, but hopefully they could get better over time, assuming the spec algorithms allow that.

@hsivonen, @mozfreddyb

mozfreddyb · 2025-02-25T09:10:09Z

So far, we have mostly treated that as an implementation detail rather than a result of how the spec is written.
Are there any specific improvements that need to be taken into consideration when specifying?
If so, we'd be happy to take them into considerations.

smaug---- · 2025-02-28T10:52:27Z

I'm mostly thinking that can the algorithms work while parsing, or is there something which requires them to run on the result document fragment. (the latter being possibly slow)

annevk · 2025-02-28T10:59:17Z

I think "replace" might be tricky to implement during parsing. Imagine <table><div><td> as input with "replace" set to table elements. We currently require the div to end up as a sibling to the tbody element (which contains a tr element which contains the td element). And I don't think we'd want it any other way. (We should add a test for this.)

It's probably not impossible though and there's nothing in sanitization that depends on things that are later in the tree, it's just that the HTML parser sometimes creates trees in weird ways.

annevk · 2025-02-28T11:01:41Z

The above also made me realize that the act of sanitization can result in mutation XSS as sanitization is mostly blind to the realities of HTML syntax. I think that's okay though. If you're going to replace a table element you are asking for a bad time and it's really hard to stop people from asking for a bad time if they want one.

otherdaniel · 2025-03-03T17:44:26Z

For our past implementation, I had measured sanitization as being much faster than parsing so that the total added time was fairly low. (Assuming the config has already been processed. The current API makes it fairly easy to re-process the config every time.) The implementation wasn't terribly sophisticated, only pre-processing the config into hash tables / hash sets, and using the browser's existing internal abstraction for QNames (which are fast to compare).

The new implementation isn't ready yet, but I expect much the same.

I think an incremental implementation within the HTML parser should be possible: We currently use an in-order traversal of the resulting DOM tree, but all decisions regarding any specific DOM node are made when reaching that node. It should be possible to convert that into an incremental algorithm where that same decision is made just when that particular node has finished parsing. That'd likely increase the maintenance burden by a good bit, though. My impression is that the HTML parser is already fairly complicated, and additional complexity will require an argument.

mozfreddyb · 2025-03-05T09:19:28Z

We discussed this as a group and want to keep this as the current architecture where parser & sanitizer are separate issues. @annevk has agreed to submit a test-case to the sanitizer-api tests in wpt such that tree construction oddities (the <table><div><td> example) are working as expected.

mozfreddyb added the discussion label Feb 25, 2025

mozfreddyb assigned annevk Mar 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure sanitizing algorithms can be implemented efficiently #259

Ensure sanitizing algorithms can be implemented efficiently #259

smaug---- commented Feb 24, 2025

mozfreddyb commented Feb 25, 2025

smaug---- commented Feb 28, 2025 •

edited

Loading

annevk commented Feb 28, 2025 •

edited

Loading

annevk commented Feb 28, 2025

otherdaniel commented Mar 3, 2025

mozfreddyb commented Mar 5, 2025

Ensure sanitizing algorithms can be implemented efficiently #259

Ensure sanitizing algorithms can be implemented efficiently #259

Comments

smaug---- commented Feb 24, 2025

mozfreddyb commented Feb 25, 2025

smaug---- commented Feb 28, 2025 • edited Loading

annevk commented Feb 28, 2025 • edited Loading

annevk commented Feb 28, 2025

otherdaniel commented Mar 3, 2025

mozfreddyb commented Mar 5, 2025

smaug---- commented Feb 28, 2025 •

edited

Loading

annevk commented Feb 28, 2025 •

edited

Loading