Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finding tags return entire html #69

Open
jz-ang opened this issue May 25, 2021 · 0 comments
Open

Finding tags return entire html #69

jz-ang opened this issue May 25, 2021 · 0 comments

Comments

@jz-ang
Copy link

jz-ang commented May 25, 2021

Describe the bug

Using soup.find on particular website(s) returns entire html instead of the matching tag(s)

Steps to reproduce the issue

Look for ul tag with attribute class="cves" (<ul class="cves">) on https://mariadb.com/kb/en/security/

from gazpacho import get, Soup
endpoint = "https://mariadb.com/kb/en/security/"
html_dump = Soup.get(endpoint)
sample = html_dump.find('ul', attrs={'class': 'cves'}, mode='all')

sample contains the contents of an entire html

Expected behavior

sample should contain the contents of the tag <ul class "cves">, which in this case would be rows of <li>-s, listing the CVEs and corresponding fixed version in MariaDB, something like:

<ul class="cves">
  <li>..</li>
  ...
  <li>..</li>
</ul>

Environment:

  • OS: Ubuntu Linux 18.04
  • Version: gazpacho 1.1, python 3.6.9

Additional information

Using BeautifulSoup on the same html_dump did get the job done, although the <li>-tags are weirdly nested together.

from bs4 import BeautifulSoup
# html_dump from above Soup.get(endpoint)
bs_soup = BeautifulSoup(html_dump.html, 'html.parser')
ul_cves = bs_soup.find_all('ul','cves')

ul_cves contain strangely nested <li>-s, from which it was still possible to extract the rows of <li>-s I was looking for.

<ul class="cves">
  <li>
    <li>
    ...
  </li></li>
</ul>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant