-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HtmlHandler, for normalizing tag cases #24
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
fix of htmlparser.DomUtils.getOuterHTML for directives
yep, it's insanely short
to get a signal when there won't be any more attributes coming
they are now available as `domhandler`
'case numbers are faster to compare NOT breaking due to last commit
Attention: The DOM changes slightly.
…quoted attribute values. Require self-closing tags to be void
…g the attributes count. Here's a different way to accomplish the same thing.
This reverts commit 181c31b.
This reverts commit f7b6d54.
…close is implied by other tags being opened, and these are closed when those tags are opened. This helps correctly parse things like lists and tables with unterminated LI or TD tags.
…correct spacing (and tried to match that)
also fixed some semantics
also replaced call to `Array#slice` with setting the stack's `length` property
as required by mocha
failed previously (only for FeedHandler tests), fixed now due to DomHandler upgrade (which removed the `ignoreWhitespace` option)
as requested in fb55/css-select#11
as requested in tautologistics#70
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As I thought through #20 and #22, I realized that the problem was not with the parser itself, but rather the results the parser created. Rather than hacking on the parser and breaking things like RSS/XML support, I decided a better approach would be to create another handler, called HtmlHandler. It embraces the case-insensitive nature of html tags, and toUpperCase()'s all tag names to respect the standard. When reserializing, the printHtml method (provided by tomdz) now toLowerCase()'s all tags, because it's printing HTML, not XML/RSS.
I've updated all tests, as well as added a few to test for scenarios where tags have mixed cases. This fork is currently in production on https://citational.com.
Please let me know any thoughts, as I'm more than willing to hear alternate opinions!