Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't allow block level elements inside an unclosed <p> tag #88

Open
DAreRodz opened this issue Feb 26, 2018 · 4 comments
Open

Don't allow block level elements inside an unclosed <p> tag #88

DAreRodz opened this issue Feb 26, 2018 · 4 comments

Comments

@DAreRodz
Copy link

It would be nice that Himalaya handles weird cases like browsers do, e.g. the following code:

<body>
  <p>
  <div></div>
</body>

In Himalaya, the resulting JSON is an element <p> wrapping an element <div>, like in this case:

<body>
  <p>
    <div></div>
  </p>
</body>

But browsers close the <p> tag before opening the <div>.

<body>
  <p></p>
  <div></div>
</body>

This is because a <p> element’s end tag may be omitted if the element is immediately followed by some tags, as indicated in the HTML5 spec.

@andrejewski
Copy link
Owner

Himalaya does account for some of these cases with closingTagAncestorBreakers, where we could add { p: [...blockLevelTags] }.

I think my hesitation to include all possible rules stems from not wanting to keep updating a list of different HTML tag categories. I will reconsider. The issues on my mind are:

  • the maintenance cost of the lists
  • trying to avoid external depedencies
  • providing a better, documented escape hatch for custom tag parsing rules

It may be worth unblocking your work with the above defaults change. I am working on a few projects right now so this is not a high priority and will take time to find a good solution.

If anyone has thoughts, feel free to share them here.

@DAreRodz
Copy link
Author

I'm wondering if you would accept a pull request for solving this particular issue, and maybe in the future for other possible cases that may come up.

The whole list of rules could be long but it's something that doesn't seem to change a lot (especially considering that the HTML specification maintains rules for historical reasons to keep running in older browsers), so I think it would be a good idea to handle most of the possible cases according to the specs.

We can help you with PR's anyway :)

@andrejewski
Copy link
Owner

@DAreRodz I think the fix to your issue is a few lines of code on your end:

import {parseDefaults} from 'himalaya'
parseDefaults.closingTagAncestorBreakers.p = ['div']

This is preferable to me because I don't want to make many patch releases as these types of issues are found. I would much rather put in the work upfront to make a one-shot release containing:

  • All spec quirks related to block/inline tags
  • Redesigned (but backwards compatible) parser options

This will take time, when I can find it. Let me know if the above workaround does not work.

@DAreRodz
Copy link
Author

DAreRodz commented Mar 2, 2018

@andrejewski I tried to make it work using the workaround, but it's still not working.

Looking at the code, it seems like the parsing algorithm automatically closes tags that are included in closingTags but only when an opening tag of the same type appears.

Moreover, closingTagAncestorBreakers is a way to prevent the auto-close behaviour mentioned above, so if I use the following line

parseDefaults.closingTagAncestorBreakers.p = ['div']

what I'm actually doing is allowing nested <p> tags when a <div> tag appears between them, and that's the opposite of what I'm trying to do!

I would suggest to replace closingTags array by an object and thus define a list of tags that could close an specific tag.

Anyway, this issue is not critical for us and we will continue using this library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants