Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question in ch2 #76

Open
shufanzhang opened this issue Jul 16, 2019 · 1 comment
Open

Question in ch2 #76

shufanzhang opened this issue Jul 16, 2019 · 1 comment

Comments

@shufanzhang
Copy link

from urllib.request import urlopen
from bs4 import BeautifulSoup
html=urlopen("http://www.pythonscraping.com/pages/warandpeace.html")
bs=BeautifulSoup(html,"html.parser")
nameList = bs.find_all(text='the prince')
print(len(nameList))

I run the code above and the result is 7. However, when I use 'ctrl+F' to search 'the prince' in the the browser, the result is 11. I'm confused why the results are inconsistent.

@Proteusiq
Copy link

That is because of casing. You have only captured 'the prince' but left out 'The prince' :) I got 11 by doing similar but with requests. You can just replace find_prince in your original code and it will work too

import re

import requests
from bs4 import BeautifulSoup

URL = "http://www.pythonscraping.com/pages/warandpeace.html"

# ignoring casing
find_prince = re.compile(r'the prince', re.IGNORECASE)

s = requests.Session()
r = s.get(URL)

soup = BeautifulSoup(r.content,'html5lib')

prince_found = soup.find_all(text = find_prince)

print(len(prince_found)) #11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants