Skip to content

Conversation

@duvallj
Copy link

@duvallj duvallj commented Nov 11, 2025

Description

In our use of lexical, we'd often see crashes for lexical error 40, "A ListItemNode must have a ListNode for a parent." We inspected our code for signs where we could be manipulating Lexical nodes manually that would lead to this case, but no luck. Then, I wanted to see if we could break this invariant using just regular Lexical functionality. Turns out we can! Pasting invalid HTML will lead to corrupted internal state, which, if left unchecked, can trigger this error code.

This PR sanitizes nodes on HTML paste so that they aren't invalid. I chose this option because:

  1. Adding an extremely-recursive function to the "hot path" of insertNodes or insertAfter seems like a bad idea.
  2. This is likely how this bug was getting triggered in the first place for our users, Lexical seems to do an OK job elsewhere at maintaining valid state.

Test plan

npm run test-unit

Before

> nix-shell --command "npm run test-unit packages/lexical/src/__tests__/unit/HTMLCopyAndPaste.test.ts" -p nodejs corepack
warning: Nix search path entry '/nix/var/nix/profiles/per-user/root/channels' does not exist, ignoring

> @lexical/[email protected] test-unit
> vitest --no-watch packages/lexical/src/__tests__/unit/HTMLCopyAndPaste.test.ts


 RUN  v3.2.4 /Users/jackduvall/lexical

 ❯  unit  packages/lexical/src/__tests__/unit/HTMLCopyAndPaste.test.ts (12 tests | 1 failed) 71ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: plain DOM text node 21ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: a paragraph element 5ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: a single div 5ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: multiple nested spans and divs 4ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: nested span in a div 3ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: nested div in a span 2ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: google doc checklist 11ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: github checklist 6ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: joplin checklist 5ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: pasting inheritance 2ms
   × HTMLCopyAndPaste tests > HTML copy paste: invalid list node correction 5ms
     → expected '<blockquote dir="auto"><li value="1">…' to be '<blockquote dir="auto"><ul><li value=…' // Object.is equality
   ✓ HTMLCopyAndPaste tests > iOS fix: Word predictions should be handled as plain text to maintain selection formatting 1ms

⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ Failed Tests 1 ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯

 FAIL   unit  packages/lexical/src/__tests__/unit/HTMLCopyAndPaste.test.ts > HTMLCopyAndPaste tests > HTML copy paste: invalid list node correction
AssertionError: expected '<blockquote dir="auto"><li value="1">…' to be '<blockquote dir="auto"><ul><li value=…' // Object.is equality

Expected: "<blockquote dir="auto"><ul><li value="1"><span data-lexical-text="true">Item A</span></li><li value="2"><span data-lexical-text="true">Item B</span></li><li value="3"><span data-lexical-text="true">Item C</span></li></ul></blockquote>"
Received: "<blockquote dir="auto"><li value="1"><span data-lexical-text="true">Item A</span></li><li value="1"><span data-lexical-text="true">Item B</span></li><li value="1"><span data-lexical-text="true">Item C</span></li></blockquote>"

 ❯ packages/lexical/src/__tests__/unit/HTMLCopyAndPaste.test.ts:142:37
    140|             }
    141|           });
    142|           expect(testEnv.innerHTML).toBe(testCase.expectedHTML);
       |                                     ^
    143|         });
    144|       });

⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯[1/1]⎯


 Test Files  1 failed (1)
      Tests  1 failed | 11 passed (12)
   Start at  20:00:35
   Duration  850ms (transform 311ms, setup 12ms, collect 493ms, tests 71ms, environment 117ms, prepare 22ms)

After

> nix-shell --command "npm run test-unit packages/lexical/src/__tests__/unit/HTMLCopyAndPaste.test.ts" -p nodejs corepack
warning: Nix search path entry '/nix/var/nix/profiles/per-user/root/channels' does not exist, ignoring

> @lexical/[email protected] test-unit
> vitest --no-watch packages/lexical/src/__tests__/unit/HTMLCopyAndPaste.test.ts


 RUN  v3.2.4 /Users/jackduvall/lexical-cursor

 ✓  unit  packages/lexical/src/__tests__/unit/HTMLCopyAndPaste.test.ts (12 tests) 65ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: plain DOM text node 21ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: a paragraph element 5ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: a single div 4ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: multiple nested spans and divs 3ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: nested span in a div 3ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: nested div in a span 2ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: google doc checklist 9ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: github checklist 6ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: joplin checklist 5ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: pasting inheritance 2ms
   ✓ HTMLCopyAndPaste tests > HTML copy paste: invalid list node correction 2ms
   ✓ HTMLCopyAndPaste tests > iOS fix: Word predictions should be handled as plain text to maintain selection formatting 1ms

 Test Files  1 passed (1)
      Tests  12 passed (12)
   Start at  19:53:06
   Duration  810ms (transform 308ms, setup 12ms, collect 481ms, tests 65ms, environment 114ms, prepare 24ms)

@meta-cla
Copy link

meta-cla bot commented Nov 11, 2025

Hi @duvallj!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@vercel
Copy link

vercel bot commented Nov 11, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
lexical Ready Ready Preview Comment Nov 11, 2025 1:08am
lexical-playground Ready Ready Preview Comment Nov 11, 2025 1:08am

@meta-cla
Copy link

meta-cla bot commented Nov 11, 2025

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 11, 2025
Copy link
Collaborator

@etrepum etrepum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem like a very good approach to solving this problem, it's unexpected and inefficient to call createParentElementNode unless it's known that one needs to be created. Generally these kinds of normalizations are handled by transforms and/or the importDOM implementations.

@duvallj
Copy link
Author

duvallj commented Nov 11, 2025

@etrepum I think I see what you mean; you're saying I should hook into existing transform machinery by returning a forChild function or after function from an importDOM implementation, is that correct?

I can't see a great way for that to work. (1) The only importDOM we can reasonably modify is the one in ListItemNode, because nothing else has any reason to care about <li> DOM nodes. However, when we do that, (2) It's not enough to use forChild, because that will only apply to children of the <li>, which is too late. And (3) a transform that uses after will have to re-traverse the tree again to fix up the <li> it just created.

I agree that it's probably not good we're calling createParentElementNode() on every single node, I should probably restructure things so that it's only called on ones where isParentRequired() is true. However, once that is true, I think we will always have to call it in order to check that the constructors match up. We could also introduce a new function getParentElementNodeConstructor() on LexicalNode in order to better-optimize this, I just left that off for simplicity's sake.

@etrepum
Copy link
Collaborator

etrepum commented Nov 11, 2025

Node Transforms is what I was referring to by transforms, it's typically used to provide various sorts of normalization regardless of how the nodes are created

@etrepum
Copy link
Collaborator

etrepum commented Nov 11, 2025

There are already a few transforms registered for ListNode and ListItemNode, but none of them currently handle this specific scenario. Here's one of them: https://github.com/facebook/lexical/blob/main/packages/lexical-list/src/index.ts#L126-L152

@duvallj
Copy link
Author

duvallj commented Nov 11, 2025

@etrepum Ah ok, those links are helpful for explaining thanks.

I still don't see how a node transform is well-suited to solve the problem at hand, however. There are a couple approaches I see:

  1. Register on ListItemNode to surround it with a ListNode if it doesn't have one for a parent already, and register on ListNode to merge with sibling ListNodes.
  2. Register on ElementNode to check if it is a non-ListNode with ListItemNode direct children, and fixup if that's the case.

(2) is so inefficient I think we can disregard it. (1) is better, since like you say it ensures that ListItemNodes will be valid no matter how they are inserted into the editor.

Hard to compare exactly the perf differences between "thing that runs once on DOM import" vs "thing that runs every time a list is updated." Though I guess in general, the former might happen more often across the userbase than the latter.

@etrepum
Copy link
Collaborator

etrepum commented Nov 11, 2025

The check for 1 is fairly cheap, there's already code running every time a list item is updated, it just doesn't have a check to see exactly what its parent is. Adding an if (!$isListNode(this.getParent())) { … } in there somewhere isn't going to make much of a practical difference. The hard part is deciding what to do when fixing up an invalid doc, which is probably why nobody has implemented a default strategy for it (e.g. split parent block and insert a ListNode where the invalid children are, replace the ListItemNode with its children and a LineBreakNode, drop it altogether, etc).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants