Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser breaks with empty attributes or unquoted attribute values #2

Open
krautsource opened this issue Mar 13, 2023 · 1 comment
Open

Comments

@krautsource
Copy link

krautsource commented Mar 13, 2023

Hey,

first off, thanks a bunch for making this project available. It's exactly what I needed for a project of mine.

There doesn't seem to be a lot of development going on, but maybe this helps somebody with similar problems I had.
The parser seems to have issues when an element contains an attribute without value, or an attribute with an unquoted value (which is both valid HTML, AFAIK).

Examples:

Missing attribute value:

dom = htmldom.HtmlDom()
dom.createDom("<div><p foo class='bar'>hello world</p><p>bye</p></div>")

dom.find("p.bar") # returns an empty list

Unquoted attribute value:

dom.createDom("<div><p foo=1 class='bar'>hello world</p><p>bye</p></div>")

dom.find("p.bar") # returns an empty list

For my use-case I am currently working around this by retrieving the HTML source using requests, string-replacing the known offending attribute with an empty string and then feeding the result into createDom().

@re-masashi
Copy link

can confirm that this happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants