You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's aiming to try and get rid of menus and things.
Possible Solution
The easiest solution would be to also apply the special case from the weight >= 25 bit of the code above to the weight < 25 bit of the code, which keeps any list that comes after a paragraph ending in a colon. (The lists which don't work fall into the weight < 25 camp, which is why they don't already work thanks to that special case.)
Another solution I thought of would be to look at either the average or maximum length of links in a list (or table / div / everything else that the tag-cleaning code gets applied to), and if it's longer than some threshold include it. In theory that should differentiate between shorter links in menus and longer sentence-length links in content; but looking at the example I provided again those links are actually quite short so that might not work as well as I'd hoped.
Expected Behavior
Postlight Parser should preserve all the actual content of the page.
Current Behavior
Postlight Parser will get rid of any bulleted / numbered lists which consist mostly of links.
Steps to Reproduce
Run Postlight Parser on https://faultlore.com/blah/defaults-affect-inference. The bulleted list a bit after the 'Some Wild Shit Swift Does' heading gets removed.
Picture of the list in question:
Detailed Description
This is the code that causes the problem:
parser/src/utils/dom/clean-tags.js
Lines 43 to 73 in e8ba7ec
It's aiming to try and get rid of menus and things.
Possible Solution
The easiest solution would be to also apply the special case from the
weight >= 25
bit of the code above to theweight < 25
bit of the code, which keeps any list that comes after a paragraph ending in a colon. (The lists which don't work fall into theweight < 25
camp, which is why they don't already work thanks to that special case.)Another solution I thought of would be to look at either the average or maximum length of links in a list (or table / div / everything else that the tag-cleaning code gets applied to), and if it's longer than some threshold include it. In theory that should differentiate between shorter links in menus and longer sentence-length links in content; but looking at the example I provided again those links are actually quite short so that might not work as well as I'd hoped.
So yeah, probably that first solution. I've already implemented it at https://github.com/Liamolucko/postlight-parser/tree/fix-link-lists and confirmed that it works.
The text was updated successfully, but these errors were encountered: