[lexical] [lexical-html] Bug Fix: Sanitize Invalid Nodes On HTML Paste #7982

duvallj · 2025-11-11T01:06:31Z

Description

In our use of lexical, we'd often see crashes for lexical error 40, "A ListItemNode must have a ListNode for a parent." We inspected our code for signs where we could be manipulating Lexical nodes manually that would lead to this case, but no luck. Then, I wanted to see if we could break this invariant using just regular Lexical functionality. Turns out we can! Pasting invalid HTML will lead to corrupted internal state, which, if left unchecked, can trigger this error code.

This PR sanitizes nodes on HTML paste so that they aren't invalid. I chose this option because:

Adding an extremely-recursive function to the "hot path" of insertNodes or insertAfter seems like a bad idea.
This is likely how this bug was getting triggered in the first place for our users, Lexical seems to do an OK job elsewhere at maintaining valid state.

Test plan

npm run test-unit

Before

> nix-shell --command "npm run test-unit packages/lexical/src/__tests__/unit/HTMLCopyAndPaste.test.ts" -p nodejs corepack
warning: Nix search path entry '/nix/var/nix/profiles/per-user/root/channels' does not exist, ignoring

> @lexical/[email protected] test-unit
> vitest --no-watch packages/lexical/src/__tests__/unit/HTMLCopyAndPaste.test.ts


 RUN  v3.2.4 /Users/jackduvall/lexical

 ❯  unit  packages/lexical/src/__tests__/unit/HTMLCopyAndPaste.test.ts (12 tests | 1 failed) 71ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: plain DOM text node 21ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: a paragraph element 5ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: a single div 5ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: multiple nested spans and divs 4ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: nested span in a div 3ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: nested div in a span 2ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: google doc checklist 11ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: github checklist 6ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: joplin checklist 5ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: pasting inheritance 2ms
   × HTMLCopyAndPaste tests > HTML copy paste: invalid list node correction 5ms
     → expected '<blockquote dir="auto"><li value="1">…' to be '<blockquote dir="auto"><ul><li value=…' // Object.is equality
   ✓ HTMLCopyAndPaste tests > iOS fix: Word predictions should be handled as plain text to maintain selection formatting 1ms

⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ Failed Tests 1 ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯

 FAIL   unit  packages/lexical/src/__tests__/unit/HTMLCopyAndPaste.test.ts > HTMLCopyAndPaste tests > HTML copy paste: invalid list node correction
AssertionError: expected '<blockquote dir="auto"><li value="1">…' to be '<blockquote dir="auto"><ul><li value=…' // Object.is equality

Expected: "<blockquote dir="auto"><ul><li value="1"><span data-lexical-text="true">Item A</span></li><li value="2"><span data-lexical-text="true">Item B</span></li><li value="3"><span data-lexical-text="true">Item C</span></li></ul></blockquote>"
Received: "<blockquote dir="auto"><li value="1"><span data-lexical-text="true">Item A</span></li><li value="1"><span data-lexical-text="true">Item B</span></li><li value="1"><span data-lexical-text="true">Item C</span></li></blockquote>"

 ❯ packages/lexical/src/__tests__/unit/HTMLCopyAndPaste.test.ts:142:37
    140|             }
    141|           });
    142|           expect(testEnv.innerHTML).toBe(testCase.expectedHTML);
       |                                     ^
    143|         });
    144|       });

⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯[1/1]⎯


 Test Files  1 failed (1)
      Tests  1 failed | 11 passed (12)
   Start at  20:00:35
   Duration  850ms (transform 311ms, setup 12ms, collect 493ms, tests 71ms, environment 117ms, prepare 22ms)

After

> nix-shell --command "npm run test-unit packages/lexical/src/__tests__/unit/HTMLCopyAndPaste.test.ts" -p nodejs corepack
warning: Nix search path entry '/nix/var/nix/profiles/per-user/root/channels' does not exist, ignoring

> @lexical/[email protected] test-unit
> vitest --no-watch packages/lexical/src/__tests__/unit/HTMLCopyAndPaste.test.ts


 RUN  v3.2.4 /Users/jackduvall/lexical-cursor

 ✓  unit  packages/lexical/src/__tests__/unit/HTMLCopyAndPaste.test.ts (12 tests) 65ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: plain DOM text node 21ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: a paragraph element 5ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: a single div 4ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: multiple nested spans and divs 3ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: nested span in a div 3ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: nested div in a span 2ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: google doc checklist 9ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: github checklist 6ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: joplin checklist 5ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: pasting inheritance 2ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: invalid list node correction 2ms
   ✓ HTMLCopyAndPaste tests > iOS fix: Word predictions should be handled as plain text to maintain selection formatting 1ms

 Test Files  1 passed (1)
      Tests  12 passed (12)
   Start at  19:53:06
   Duration  810ms (transform 308ms, setup 12ms, collect 481ms, tests 65ms, environment 114ms, prepare 24ms)

meta-cla · 2025-11-11T01:06:37Z

Hi @duvallj!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

vercel · 2025-11-11T01:06:37Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
lexical	Ready	Preview	Comment	Nov 11, 2025 1:08am
lexical-playground	Ready	Preview	Comment	Nov 11, 2025 1:08am

meta-cla · 2025-11-11T02:19:57Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

etrepum

This doesn't seem like a very good approach to solving this problem, it's unexpected and inefficient to call createParentElementNode unless it's known that one needs to be created. Generally these kinds of normalizations are handled by transforms and/or the importDOM implementations.

duvallj · 2025-11-11T22:41:25Z

@etrepum I think I see what you mean; you're saying I should hook into existing transform machinery by returning a forChild function or after function from an importDOM implementation, is that correct?

I can't see a great way for that to work. (1) The only importDOM we can reasonably modify is the one in ListItemNode, because nothing else has any reason to care about <li> DOM nodes. However, when we do that, (2) It's not enough to use forChild, because that will only apply to children of the <li>, which is too late. And (3) a transform that uses after will have to re-traverse the tree again to fix up the <li> it just created.

I agree that it's probably not good we're calling createParentElementNode() on every single node, I should probably restructure things so that it's only called on ones where isParentRequired() is true. However, once that is true, I think we will always have to call it in order to check that the constructors match up. We could also introduce a new function getParentElementNodeConstructor() on LexicalNode in order to better-optimize this, I just left that off for simplicity's sake.

etrepum · 2025-11-11T22:52:21Z

Node Transforms is what I was referring to by transforms, it's typically used to provide various sorts of normalization regardless of how the nodes are created

etrepum · 2025-11-11T22:54:34Z

There are already a few transforms registered for ListNode and ListItemNode, but none of them currently handle this specific scenario. Here's one of them: https://github.com/facebook/lexical/blob/main/packages/lexical-list/src/index.ts#L126-L152

duvallj · 2025-11-11T23:36:27Z

@etrepum Ah ok, those links are helpful for explaining thanks.

I still don't see how a node transform is well-suited to solve the problem at hand, however. There are a couple approaches I see:

Register on ListItemNode to surround it with a ListNode if it doesn't have one for a parent already, and register on ListNode to merge with sibling ListNodes.
Register on ElementNode to check if it is a non-ListNode with ListItemNode direct children, and fixup if that's the case.

(2) is so inefficient I think we can disregard it. (1) is better, since like you say it ensures that ListItemNodes will be valid no matter how they are inserted into the editor.

Hard to compare exactly the perf differences between "thing that runs once on DOM import" vs "thing that runs every time a list is updated." Though I guess in general, the former might happen more often across the userbase than the latter.

etrepum · 2025-11-11T23:50:57Z

The check for 1 is fairly cheap, there's already code running every time a list item is updated, it just doesn't have a check to see exactly what its parent is. Adding an if (!$isListNode(this.getParent())) { … } in there somewhere isn't going to make much of a practical difference. The hard part is deciding what to do when fixing up an invalid doc, which is probably why nobody has implemented a default strategy for it (e.g. split parent block and insert a ListNode where the invalid children are, replace the ListItemNode with its children and a LineBreakNode, drop it altogether, etc).

duvallj added 3 commits November 10, 2025 19:29

wip: add new test case

310ad65

wip: add new invariant

a44ca84

pass test with new logic

12644c5

duvallj requested review from acywatson, etrepum, fantactuka, ivailop7, potatowagon, takuyakanbr and zurfyx as code owners November 11, 2025 01:06

vercel bot deployed to Preview – lexical November 11, 2025 01:07 View deployment

vercel bot deployed to Preview – lexical-playground November 11, 2025 01:08 View deployment

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 11, 2025

etrepum reviewed Nov 11, 2025

View reviewed changes

etrepum mentioned this pull request Nov 20, 2025

Bug: p tags in anchor tags are not handled correct #7977

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[lexical] [lexical-html] Bug Fix: Sanitize Invalid Nodes On HTML Paste #7982

[lexical] [lexical-html] Bug Fix: Sanitize Invalid Nodes On HTML Paste #7982

duvallj commented Nov 11, 2025 •

edited

Loading

Uh oh!

meta-cla bot commented Nov 11, 2025

Uh oh!

vercel bot commented Nov 11, 2025 •

edited

Loading

Uh oh!

meta-cla bot commented Nov 11, 2025

Uh oh!

etrepum left a comment

Uh oh!

duvallj commented Nov 11, 2025

Uh oh!

etrepum commented Nov 11, 2025

Uh oh!

etrepum commented Nov 11, 2025

Uh oh!

duvallj commented Nov 11, 2025

Uh oh!

etrepum commented Nov 11, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[lexical] [lexical-html] Bug Fix: Sanitize Invalid Nodes On HTML Paste #7982

Are you sure you want to change the base?

[lexical] [lexical-html] Bug Fix: Sanitize Invalid Nodes On HTML Paste #7982

Conversation

duvallj commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test plan

Before

After

Uh oh!

meta-cla bot commented Nov 11, 2025

Action Required

Process

Uh oh!

vercel bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meta-cla bot commented Nov 11, 2025

Uh oh!

etrepum left a comment

Choose a reason for hiding this comment

Uh oh!

duvallj commented Nov 11, 2025

Uh oh!

etrepum commented Nov 11, 2025

Uh oh!

etrepum commented Nov 11, 2025

Uh oh!

duvallj commented Nov 11, 2025

Uh oh!

etrepum commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

duvallj commented Nov 11, 2025 •

edited

Loading

vercel bot commented Nov 11, 2025 •

edited

Loading

etrepum commented Nov 11, 2025 •

edited

Loading