Skip to content

Commit

Permalink
Avoiding CDATA nonsense
Browse files Browse the repository at this point in the history
  • Loading branch information
Yomguithereal committed Feb 15, 2024
1 parent 50571ac commit c2e12a9
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 5 deletions.
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ charset-normalizer==3.3.2
dateparser==1.1.6
ebbe==1.13.2
json5==0.9.11
lxml>=4.9.2,<5
lxml>=4.9.2
nanoid==2.0.0
playwright==1.35.0
playwright_stealth==1.0.5
Expand Down
6 changes: 2 additions & 4 deletions test/scraper_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,8 @@
</table>
"""


# NOTE: CDATA is handled very differently depending on lxml & bs4 versions
THE_WORST_HTML = """
<div>Some text isn't
it?
Expand Down Expand Up @@ -136,8 +138,6 @@
<li>Other</li>
<li>Again</li>
</ol>
<p>
<![CDATA[some very interesting stuff]]></p>
<p>
This is <span>a large span </span>
with something else over <strong>here</strong>.
Expand Down Expand Up @@ -997,8 +997,6 @@ def clean(t):
Other
Again
some very interesting stuff
This is a large span with something else over here.
Hello gorgeous!
Expand Down

0 comments on commit c2e12a9

Please sign in to comment.